Welcome! Log In Create A New Profile

Advanced

1.8.7 http-tunnel doesn't seem to work? (but default http-keep-alive does)

Posted by PiBa-NL 
Hi Haproxy List,

I upgraded to 1.8.7 (coming from 1.8.3) and found i could no-longer use
one of our IIS websites. The login procedure thats using windows
authentication / ntlm seems to fail..
Removing option http-tunnel seems to fix this though. Afaik http-tunnel
'should' switch to tunnelmode after the first request and as such should
have no issue sending the credentials the the server.?.

Below are:  config / haproxy -vv / tcpdump / sess all

Is it a known issue? Is there anything else i can provide?

Regards,

PiBa-NL (Pieter)

-----------------------------
# Automaticaly generated, dont edit manually.
# Generated on: 2018-04-10 21:00
global
    maxconn            1000
    log            192.168.8.10    local1    info
    stats socket /tmp/haproxy.socket level admin
    gid            80
    nbproc            1
    nbthread            1
    hard-stop-after        15m
    chroot                /tmp/haproxy_chroot
    daemon
    tune.ssl.default-dh-param    2048
    defaults
    option log-health-checks


frontend site.domain.nl2
    bind            192.168.8.5:443 name 192.168.8.5:443  ssl  crt
/var/etc/haproxy/site.domain.nl2.pem crt-list
/var/etc/haproxy/site.domain.nl2.crt_list
    mode            http
    log            global
    option            httplog
    option            http-tunnel
    maxconn            100
    timeout client        1h
    option tcplog
    default_backend website-intern_http_ipvANY

backend site-intern_http_ipvANY
    mode            http
    log            global
    option            http-tunnel
    timeout connect        10s
    timeout server        1h
    retries            3
    server            site 192.168.13.44:443 ssl  weight 1.1 verify none

-----------------------------
[2.4.3-RELEASE][[email protected]_5.local]/root: haproxy -vv
HA-Proxy version 1.8.7 2018/04/07
Copyright 2000-2018 Willy Tarreau <[email protected]>

Build options :
  TARGET  = freebsd
  CPU     = generic
  CC      = cc
  CFLAGS  = -O2 -pipe -fstack-protector -fno-strict-aliasing
-fno-strict-aliasing -Wdeclaration-after-statement -fwrapv
-fno-strict-overflow -Wno-address-of-packed-member -Wno-null-dereference
-Wno-unused-label -DFREEBSD_PORTS
  OPTIONS = USE_GETADDRINFO=1 USE_ZLIB=1 USE_CPU_AFFINITY=1
USE_ACCEPT4=1 USE_REGPARM=1 USE_OPENSSL=1 USE_LUA=1 USE_STATIC_PCRE=1
USE_PCRE_JIT=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with network namespace support.
Built with zlib version : 1.2.11
Running on zlib version : 1.2.11
Compression algorithms supported : identity("identity"),
deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with PCRE version : 8.40 2017-01-11
Running on PCRE version : 8.40 2017-01-11
PCRE library supports JIT : yes
Built with multi-threading support.
Encrypted password support via crypt(3): yes
Built with transparent proxy support using: IP_BINDANY IPV6_BINDANY
Built with Lua version : Lua 5.3.4
Built with OpenSSL version : OpenSSL 1.0.2m-freebsd  2 Nov 2017
Running on OpenSSL version : OpenSSL 1.0.2m-freebsd  2 Nov 2017
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2

Available polling systems :
     kqueue : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use kqueue.

Available filters :
        [TRACE] trace
        [COMP] compression
        [SPOE] spoe
-----------------------------
tcpdump of : Client 8.32>Haproxy 8.5:

21:09:13.452118 IP 192.168.8.32.51658 > 192.168.8.5.443: Flags , seq
1417754656, win 8192, options [mss 1260,nop,wscale 8,nop,nop,sackOK],
length 0
21:09:13.452312 IP 192.168.8.5.443 > 192.168.8.32.51658: Flags [S.], seq
1950703403, ack 1417754657, win 65228, options [mss 1260,nop,wscale
7,sackOK,eol], length 0
21:09:13.453030 IP 192.168.8.32.51658 > 192.168.8.5.443: Flags [.], ack
1, win 260, length 0
21:09:13.457740 IP 192.168.8.32.51658 > 192.168.8.5.443: Flags [P.], seq
1:190, ack 1, win 260, length 189
21:09:13.457762 IP 192.168.8.5.443 > 192.168.8.32.51658: Flags [.], ack
190, win 510, length 0
21:09:13.459503 IP 192.168.8.5.443 > 192.168.8.32.51658: Flags [.], seq
1:1261, ack 190, win 511, length 1260
21:09:13.459516 IP 192.168.8.5.443 > 192.168.8.32.51658: Flags [.], seq
1261:2521, ack 190, win 511, length 1260
21:09:13.459527 IP 192.168.8.5.443 > 192.168.8.32.51658: Flags [P.], seq
2521:2686, ack 190, win 511, length 165
21:09:13.460342 IP 192.168.8.32.51658 > 192.168.8.5.443: Flags [.], ack
2686, win 260, length 0
21:09:13.478984 IP 192.168.8.32.51658 > 192.168.8.5.443: Flags [P.], seq
190:316, ack 2686, win 260, length 126
21:09:13.479038 IP 192.168.8.5.443 > 192.168.8.32.51658: Flags [.], ack
316, win 510, length 0
21:09:13.480105 IP 192.168.8.5.443 > 192.168.8.32.51658: Flags [P.], seq
2686:2737, ack 316, win 511, length 51
21:09:13.490136 IP 192.168.8.32.51658 > 192.168.8.5.443: Flags [P.], seq
316:615, ack 2737, win 260, length 299
21:09:13.490159 IP 192.168.8.5.443 > 192.168.8.32.51658: Flags [.], ack
615, win 509, length 0
21:09:13.502733 IP 192.168.8.5.443 > 192.168.8.32.51658: Flags [.], seq
2737:3997, ack 615, win 511, length 1260
21:09:13.502745 IP 192.168.8.5.443 > 192.168.8.32.51658: Flags [P.], seq
3997:4314, ack 615, win 511, length 317
21:09:13.502970 IP 192.168.8.32.51658 > 192.168.8.5.443: Flags [.], ack
4314, win 260, length 0
<User fills in the password popup from the browser this takes a few seconds>
21:09:25.294174 IP 192.168.8.32.51658 > 192.168.8.5.443: Flags [P.], seq
615:992, ack 4314, win 260, length 377
21:09:25.294233 IP 192.168.8.5.443 > 192.168.8.32.51658: Flags [.], ack
992, win 508, length 0
<After a little while the connection is lost..>
21:11:26.433054 IP 192.168.8.5.443 > 192.168.8.32.51658: Flags [P.], seq
4314:4345, ack 992, win 508, length 31
21:11:26.433087 IP 192.168.8.5.443 > 192.168.8.32.51658: Flags [F.], seq
4345, ack 992, win 508, length 0
21:11:26.434317 IP 192.168.8.32.51658 > 192.168.8.5.443: Flags [.], ack
4346, win 260, length 0
21:11:26.446968 IP 192.168.8.32.51658 > 192.168.8.5.443: Flags [F.], seq
992, ack 4346, win 260, length 0
21:11:26.446994 IP 192.168.8.5.443 > 192.168.8.32.51658: Flags [.], ack
993, win 508, length 0

tcpdump of: Haproxy 13.5> Webserver 13.44:

21:09:13.490504 IP 192.168.13.5.6015 > 192.168.13.44.443: Flags , seq
2510965801, win 65228, options [mss 1460,nop,wscale 7,sackOK,TS val
1036282720 ecr 0], length 0
21:09:13.492572 IP 192.168.13.44.443 > 192.168.13.5.6015: Flags [S.],
seq 3868280669, ack 2510965802, win 8192, options [mss 1460,nop,wscale
8,sackOK,TS val 4405365 ecr 1036282720], length 0
21:09:13.492596 IP 192.168.13.5.6015 > 192.168.13.44.443: Flags [.], ack
1, win 513, options [nop,nop,TS val 1036282720 ecr 4405365], length 0
21:09:13.492737 IP 192.168.13.5.6015 > 192.168.13.44.443: Flags [P.],
seq 1:518, ack 1, win 513, options [nop,nop,TS val 1036282720 ecr
4405365], length 517
21:09:13.495339 IP 192.168.13.44.443 > 192.168.13.5.6015: Flags [.], seq
1:1449, ack 518, win 260, options [nop,nop,TS val 4405365 ecr
1036282720], length 1448
21:09:13.495356 IP 192.168.13.5.6015 > 192.168.13.44.443: Flags [.], ack
1449, win 501, options [nop,nop,TS val 1036282729 ecr 4405365], length 0
21:09:13.495363 IP 192.168.13.44.443 > 192.168.13.5.6015: Flags [P.],
seq 1449:2660, ack 518, win 260, options [nop,nop,TS val 4405365 ecr
1036282720], length 1211
21:09:13.495374 IP 192.168.13.5.6015 > 192.168.13.44.443: Flags [.], ack
2660, win 492, options [nop,nop,TS val 1036282729 ecr 4405365], length 0
21:09:13.497287 IP 192.168.13.5.6015 > 192.168.13.44.443: Flags [P.],
seq 518:644, ack 2660, win 513, options [nop,nop,TS val 1036282729 ecr
4405365], length 126
21:09:13.500555 IP 192.168.13.44.443 > 192.168.13.5.6015: Flags [P.],
seq 2660:2711, ack 644, win 259, options [nop,nop,TS val 4405366 ecr
1036282729], length 51
21:09:13.500570 IP 192.168.13.5.6015 > 192.168.13.44.443: Flags [.], ack
2711, win 512, options [nop,nop,TS val 1036282729 ecr 4405366], length 0
21:09:13.500799 IP 192.168.13.5.6015 > 192.168.13.44.443: Flags [P.],
seq 644:943, ack 2711, win 513, options [nop,nop,TS val 1036282729 ecr
4405366], length 299
21:09:13.502465 IP 192.168.13.44.443 > 192.168.13.5.6015: Flags [.], seq
2711:4159, ack 943, win 258, options [nop,nop,TS val 4405366 ecr
1036282729], length 1448
21:09:13.502483 IP 192.168.13.5.6015 > 192.168.13.44.443: Flags [.], ack
4159, win 501, options [nop,nop,TS val 1036282729 ecr 4405366], length 0
21:09:13.502492 IP 192.168.13.44.443 > 192.168.13.5.6015: Flags [P.],
seq 4159:4288, ack 943, win 258, options [nop,nop,TS val 4405366 ecr
1036282729], length 129
21:09:13.502502 IP 192.168.13.5.6015 > 192.168.13.44.443: Flags [.], ack
4288, win 500, options [nop,nop,TS val 1036282729 ecr 4405366], length 0
<User/Password is never send>
21:11:26.432105 IP 192.168.13.44.443 > 192.168.13.5.6015: Flags [R.],
seq 4288, ack 943, win 0, length 0

-----------------------------
'sess all' before and after the credentials..

[2.4.3-RELEASE][[email protected]_5.itm.local]/root:
/usr/local/pkg/haproxy/haproxy_socket.sh show sess all

show sess all

0x8024adf80: [10/Apr/2018:21:09:13.453304] id=7 proto=tcpv4
source=192.168.8.32:51658
  flags=0x4ce, conn_retries=3, srv_conn=0x8024f6000, pend_pos=0x0
  frontend=portal.itemedical.nl2 (id=2 mode=http),
listener=192.168.8.5:443 (id=1) addr=192.168.8.5:443
  backend=synergy-intern_http_ipvANY (id=3 mode=http)
addr=192.168.13.5:6015
  server=portalServer (id=1) addr=192.168.13.44:443
  task=0x80242d8c0 (state=0x08 nice=0 calls=4 exp=7s tmask=0x1 age=3s)
  txn=0x80244b8c0 flags=0x100000 meth=1 status=401 req.st=MSG_TUNNEL
rsp.st=MSG_TUNNEL waiting=0
  si[0]=0x8024ae1c8 (state=EST flags=0x08 endp0=NONE:0x8024a8350
exp=<NEVER>, et=0x000)
  si[1]=0x8024ae1f0 (state=EST flags=0x118 endp1=NONE:0x8024a8380
exp=<NEVER>, et=0x000)
  co0=0x80243c480 ctrl=tcpv4 xprt=SSL mux=PASS data=STRM
target=LISTENER:0x80243c300
      flags=0x80203306 fd=10 fd.state=25 fd.cache=0 updt=0 fd.tmask=0x1
  co1=0x80243c600 ctrl=tcpv4 xprt=SSL mux=PASS data=STRM
target=SERVER:0x8024f6000
      flags=0x00203306 fd=11 fd.state=25 fd.cache=0 updt=0 fd.tmask=0x1
  req=0x8024adf90 (f=0x8848000 an=0x0 pipe=0 tofwd=-1 total=270)
      an_exp=<NEVER> rex=59m57s wex=<NEVER>
      buf=0x7d2c18 data=0x7d2c2c o=0 p=0 req.next=0 i=0 size=0
  res=0x8024adfd0 (f=0xc8048200 an=0x0 pipe=0 tofwd=-1 total=1548)
      an_exp=<NEVER> rex=59m57s wex=<NEVER>
      buf=0x802507840 data=0x802507854 o=0 p=0 rsp.next=0 i=0 size=16384
0x8024aed80: [10/Apr/2018:21:09:16.365440] id=8 proto=unix_stream
source=unix:1
  flags=0x88, conn_retries=0, srv_conn=0x0, pend_pos=0x0
  frontend=GLOBAL (id=0 mode=tcp), listener=? (id=1) addr=unix:1
  backend=<NONE> (id=-1 mode=-)
  server=<NONE> (id=-1)
  task=0x80242d780 (state=0x02 nice=-64 calls=1 exp=10s tmask=0x1 age=0s)
  si[0]=0x8024aefc8 (state=EST flags=0x08 endp0=NONE:0x8024a8530
exp=<NEVER>, et=0x000)
  si[1]=0x8024aeff0 (state=EST flags=0x4018 endp1=APPCTX:0x80243cc00
exp=<NEVER>, et=0x000)
  co0=0x80243ca80 ctrl=unix_stream xprt=RAW mux=PASS data=STRM
target=LISTENER:0x80243c180
      flags=0x00203306 fd=12 fd.state=25 fd.cache=0 updt=0 fd.tmask=0x1
  app1=0x80243cc00 st0=7 st1=0 st2=3 applet=<CLI> tmask=0x1
  req=0x8024aed90 (f=0x40c08202 an=0x0 pipe=0 tofwd=-1 total=15)
      an_exp=<NEVER> rex=10s wex=<NEVER>
      buf=0x8025b3340 data=0x8025b3354 o=0 p=0 req.next=0 i=0 size=16384
  res=0x8024aedd0 (f=0x80008002 an=0x0 pipe=0 tofwd=-1 total=1398)
      an_exp=<NEVER> rex=<NEVER> wex=<NEVER>
      buf=0x802501900 data=0x802501914 o=1398 p=1398 rsp.next=0 i=0
size=16384

[2.4.3-RELEASE][[email protected]_5.itm.local]/root:
/usr/local/pkg/haproxy/haproxy_socket.sh show sess all
show sess all

0x8024adf80: [10/Apr/2018:21:09:13.453304] id=7 proto=tcpv4
source=192.168.8.32:51658
  flags=0x4ce, conn_retries=3, srv_conn=0x8024f6000, pend_pos=0x0
  frontend=portal.itemedical.nl2 (id=2 mode=http),
listener=192.168.8.5:443 (id=1) addr=192.168.8.5:443
  backend=synergy-intern_http_ipvANY (id=3 mode=http)
addr=192.168.13.5:6015
  server=portalServer (id=1) addr=192.168.13.44:443
  task=0x80242d8c0 (state=0x04 nice=0 calls=5 exp=59m43s tmask=0x1 age=16s)
  txn=0x80244b8c0 flags=0x100000 meth=1 status=401 req.st=MSG_TUNNEL
rsp.st=MSG_TUNNEL waiting=0
  si[0]=0x8024ae1c8 (state=EST flags=0x08 endp0=NONE:0x8024a8350
exp=<NEVER>, et=0x000)
  si[1]=0x8024ae1f0 (state=EST flags=0x118 endp1=NONE:0x8024a8380
exp=<NEVER>, et=0x000)
  co0=0x80243c480 ctrl=tcpv4 xprt=SSL mux=PASS data=STRM
target=LISTENER:0x80243c300
      flags=0x80203306 fd=10 fd.state=25 fd.cache=0 updt=0 fd.tmask=0x1
  co1=0x80243c600 ctrl=tcpv4 xprt=SSL mux=PASS data=STRM
target=SERVER:0x8024f6000
      flags=0x00203306 fd=11 fd.state=25 fd.cache=0 updt=0 fd.tmask=0x1
  req=0x8024adf90 (f=0x8848000 an=0x0 pipe=0 tofwd=-1 total=270)
      an_exp=<NEVER> rex=59m43s wex=<NEVER>
      buf=0x7d2c18 data=0x7d2c2c o=0 p=0 req.next=0 i=0 size=0
  res=0x8024adfd0 (f=0x88048000 an=0x0 pipe=0 tofwd=-1 total=1548)
      an_exp=<NEVER> rex=59m43s wex=<NEVER>
      buf=0x7d2c18 data=0x7d2c2c o=0 p=0 rsp.next=0 i=0 size=0
0x8024aed80: [10/Apr/2018:21:09:29.621432] id=9 proto=unix_stream
source=unix:1
  flags=0x88, conn_retries=0, srv_conn=0x0, pend_pos=0x0
  frontend=GLOBAL (id=0 mode=tcp), listener=? (id=1) addr=unix:1
  backend=<NONE> (id=-1 mode=-)
  server=<NONE> (id=-1)
  task=0x80242d780 (state=0x02 nice=-64 calls=1 exp=10s tmask=0x1 age=0s)
  si[0]=0x8024aefc8 (state=EST flags=0x08 endp0=NONE:0x8024a8530
exp=<NEVER>, et=0x000)
  si[1]=0x8024aeff0 (state=EST flags=0x4018 endp1=APPCTX:0x80243cc00
exp=<NEVER>, et=0x000)
  co0=0x80243ca80 ctrl=unix_stream xprt=RAW mux=PASS data=STRM
target=LISTENER:0x80243c180
      flags=0x00203306 fd=12 fd.state=25 fd.cache=0 updt=0 fd.tmask=0x1
  app1=0x80243cc00 st0=7 st1=0 st2=3 applet=<CLI> tmask=0x1
  req=0x8024aed90 (f=0x40c08202 an=0x0 pipe=0 tofwd=-1 total=15)
      an_exp=<NEVER> rex=10s wex=<NEVER>
      buf=0x802507840 data=0x802507854 o=0 p=0 req.next=0 i=0 size=16384
  res=0x8024aedd0 (f=0x80008002 an=0x0 pipe=0 tofwd=-1 total=1393)
      an_exp=<NEVER> rex=<NEVER> wex=<NEVER>
      buf=0x802501900 data=0x802501914 o=1393 p=1393 rsp.next=0 i=0
size=16384
Hi List / Willy,

Removing the line below 'fixes' my issue with kqueue poller and NTLM
authentication with option http-tunnel..
Though i'm sure something is then broken then horribly also (CPU go's
100%..). And i'm not sure what the proper fix would be. (ive got to
little knowledge of what the various flags do and C++ aint a language i
normally ever look at.. )
The 'breaking' commit was this one:
http://git.haproxy.org/?p=haproxy-1.8.git;a=commit;h=f839593dd26ec210ba66d74b2a4c2040dd1ed806

Can you take a new look at that piece of code? (as the commit was yours ;) )
Thanks in advance :).

Regards,
PiBa-NL (Pieter)

 src/ev_kqueue.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/ev_kqueue.c b/src/ev_kqueue.c
index a103ece..49e7302 100644
--- a/src/ev_kqueue.c
+++ b/src/ev_kqueue.c
@@ -78,7 +78,7 @@ REGPRM2 static void _do_poll(struct poller *p, int exp)
             else if (fdtab[fd].polled_mask & tid_bit)
                 EV_SET(&kev[changes++], fd, EVFILT_WRITE, EV_DELETE,
0, 0, NULL);

-            HA_ATOMIC_OR(&fdtab[fd].polled_mask, tid_bit);
+//            HA_ATOMIC_OR(&fdtab[fd].polled_mask, tid_bit);
         }
     }
     if (changes)



Op 10-4-2018 om 23:11 schreef PiBa-NL:
> Hi Haproxy List,
>
> I upgraded to 1.8.7 (coming from 1.8.3) and found i could no-longer
> use one of our IIS websites. The login procedure thats using windows
> authentication / ntlm seems to fail..
> Removing option http-tunnel seems to fix this though. Afaik
> http-tunnel 'should' switch to tunnelmode after the first request and
> as such should have no issue sending the credentials the the server.?.
>
> Below are:  config / haproxy -vv / tcpdump / sess all
>
> Is it a known issue? Is there anything else i can provide?
>
> Regards,
>
> PiBa-NL (Pieter)
>
> -----------------------------
> # Automaticaly generated, dont edit manually.
> # Generated on: 2018-04-10 21:00
> global
>     maxconn            1000
>     log            192.168.8.10    local1    info
>     stats socket /tmp/haproxy.socket level admin
>     gid            80
>     nbproc            1
>     nbthread            1
>     hard-stop-after        15m
>     chroot                /tmp/haproxy_chroot
>     daemon
>     tune.ssl.default-dh-param    2048
>     defaults
>     option log-health-checks
>
>
> frontend site.domain.nl2
>     bind            192.168.8.5:443 name 192.168.8.5:443  ssl  crt
> /var/etc/haproxy/site.domain.nl2.pem crt-list
> /var/etc/haproxy/site.domain.nl2.crt_list
>     mode            http
>     log            global
>     option            httplog
>     option            http-tunnel
>     maxconn            100
>     timeout client        1h
>     option tcplog
>     default_backend website-intern_http_ipvANY
>
> backend site-intern_http_ipvANY
>     mode            http
>     log            global
>     option            http-tunnel
>     timeout connect        10s
>     timeout server        1h
>     retries            3
>     server            site 192.168.13.44:443 ssl  weight 1.1 verify none
>
> -----------------------------
> [2.4.3-RELEASE][[email protected]_5.local]/root: haproxy -vv
> HA-Proxy version 1.8.7 2018/04/07
> Copyright 2000-2018 Willy Tarreau <[email protected]>
>
> Build options :
>   TARGET  = freebsd
>   CPU     = generic
>   CC      = cc
>   CFLAGS  = -O2 -pipe -fstack-protector -fno-strict-aliasing
> -fno-strict-aliasing -Wdeclaration-after-statement -fwrapv
> -fno-strict-overflow -Wno-address-of-packed-member
> -Wno-null-dereference -Wno-unused-label -DFREEBSD_PORTS
>   OPTIONS = USE_GETADDRINFO=1 USE_ZLIB=1 USE_CPU_AFFINITY=1
> USE_ACCEPT4=1 USE_REGPARM=1 USE_OPENSSL=1 USE_LUA=1 USE_STATIC_PCRE=1
> USE_PCRE_JIT=1
>
> Default settings :
>   maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200
>
> Built with network namespace support.
> Built with zlib version : 1.2.11
> Running on zlib version : 1.2.11
> Compression algorithms supported : identity("identity"),
> deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
> Built with PCRE version : 8.40 2017-01-11
> Running on PCRE version : 8.40 2017-01-11
> PCRE library supports JIT : yes
> Built with multi-threading support.
> Encrypted password support via crypt(3): yes
> Built with transparent proxy support using: IP_BINDANY IPV6_BINDANY
> Built with Lua version : Lua 5.3.4
> Built with OpenSSL version : OpenSSL 1.0.2m-freebsd  2 Nov 2017
> Running on OpenSSL version : OpenSSL 1.0.2m-freebsd  2 Nov 2017
> OpenSSL library supports TLS extensions : yes
> OpenSSL library supports SNI : yes
> OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2
>
> Available polling systems :
>      kqueue : pref=300,  test result OK
>        poll : pref=200,  test result OK
>      select : pref=150,  test result OK
> Total: 3 (3 usable), will use kqueue.
>
> Available filters :
>         [TRACE] trace
>         [COMP] compression
>         [SPOE] spoe
> -----------------------------
> tcpdump of : Client 8.32>Haproxy 8.5:
>
> 21:09:13.452118 IP 192.168.8.32.51658 > 192.168.8.5.443: Flags ,
> seq 1417754656, win 8192, options [mss 1260,nop,wscale
> 8,nop,nop,sackOK], length 0
> 21:09:13.452312 IP 192.168.8.5.443 > 192.168.8.32.51658: Flags [S.],
> seq 1950703403, ack 1417754657, win 65228, options [mss
> 1260,nop,wscale 7,sackOK,eol], length 0
> 21:09:13.453030 IP 192.168.8.32.51658 > 192.168.8.5.443: Flags [.],
> ack 1, win 260, length 0
> 21:09:13.457740 IP 192.168.8.32.51658 > 192.168.8.5.443: Flags [P.],
> seq 1:190, ack 1, win 260, length 189
> 21:09:13.457762 IP 192.168.8.5.443 > 192.168.8.32.51658: Flags [.],
> ack 190, win 510, length 0
> 21:09:13.459503 IP 192.168.8.5.443 > 192.168.8.32.51658: Flags [.],
> seq 1:1261, ack 190, win 511, length 1260
> 21:09:13.459516 IP 192.168.8.5.443 > 192.168.8.32.51658: Flags [.],
> seq 1261:2521, ack 190, win 511, length 1260
> 21:09:13.459527 IP 192.168.8.5.443 > 192.168.8.32.51658: Flags [P.],
> seq 2521:2686, ack 190, win 511, length 165
> 21:09:13.460342 IP 192.168.8.32.51658 > 192.168.8.5.443: Flags [.],
> ack 2686, win 260, length 0
> 21:09:13.478984 IP 192.168.8.32.51658 > 192.168.8.5.443: Flags [P.],
> seq 190:316, ack 2686, win 260, length 126
> 21:09:13.479038 IP 192.168.8.5.443 > 192.168.8.32.51658: Flags [.],
> ack 316, win 510, length 0
> 21:09:13.480105 IP 192.168.8.5.443 > 192.168.8.32.51658: Flags [P.],
> seq 2686:2737, ack 316, win 511, length 51
> 21:09:13.490136 IP 192.168.8.32.51658 > 192.168.8.5.443: Flags [P.],
> seq 316:615, ack 2737, win 260, length 299
> 21:09:13.490159 IP 192.168.8.5.443 > 192.168.8.32.51658: Flags [.],
> ack 615, win 509, length 0
> 21:09:13.502733 IP 192.168.8.5.443 > 192.168.8.32.51658: Flags [.],
> seq 2737:3997, ack 615, win 511, length 1260
> 21:09:13.502745 IP 192.168.8.5.443 > 192.168.8.32.51658: Flags [P.],
> seq 3997:4314, ack 615, win 511, length 317
> 21:09:13.502970 IP 192.168.8.32.51658 > 192.168.8.5.443: Flags [.],
> ack 4314, win 260, length 0
> <User fills in the password popup from the browser this takes a few
> seconds>
> 21:09:25.294174 IP 192.168.8.32.51658 > 192.168.8.5.443: Flags [P.],
> seq 615:992, ack 4314, win 260, length 377
> 21:09:25.294233 IP 192.168.8.5.443 > 192.168.8.32.51658: Flags [.],
> ack 992, win 508, length 0
> <After a little while the connection is lost..>
> 21:11:26.433054 IP 192.168.8.5.443 > 192.168.8.32.51658: Flags [P.],
> seq 4314:4345, ack 992, win 508, length 31
> 21:11:26.433087 IP 192.168.8.5.443 > 192.168.8.32.51658: Flags [F.],
> seq 4345, ack 992, win 508, length 0
> 21:11:26.434317 IP 192.168.8.32.51658 > 192.168.8.5.443: Flags [.],
> ack 4346, win 260, length 0
> 21:11:26.446968 IP 192.168.8.32.51658 > 192.168.8.5.443: Flags [F.],
> seq 992, ack 4346, win 260, length 0
> 21:11:26.446994 IP 192.168.8.5.443 > 192.168.8.32.51658: Flags [.],
> ack 993, win 508, length 0
>
> tcpdump of: Haproxy 13.5> Webserver 13.44:
>
> 21:09:13.490504 IP 192.168.13.5.6015 > 192.168.13.44.443: Flags ,
> seq 2510965801, win 65228, options [mss 1460,nop,wscale 7,sackOK,TS
> val 1036282720 ecr 0], length 0
> 21:09:13.492572 IP 192.168.13.44.443 > 192.168.13.5.6015: Flags [S.],
> seq 3868280669, ack 2510965802, win 8192, options [mss 1460,nop,wscale
> 8,sackOK,TS val 4405365 ecr 1036282720], length 0
> 21:09:13.492596 IP 192.168.13.5.6015 > 192.168.13.44.443: Flags [.],
> ack 1, win 513, options [nop,nop,TS val 1036282720 ecr 4405365], length 0
> 21:09:13.492737 IP 192.168.13.5.6015 > 192.168.13.44.443: Flags [P.],
> seq 1:518, ack 1, win 513, options [nop,nop,TS val 1036282720 ecr
> 4405365], length 517
> 21:09:13.495339 IP 192.168.13.44.443 > 192.168.13.5.6015: Flags [.],
> seq 1:1449, ack 518, win 260, options [nop,nop,TS val 4405365 ecr
> 1036282720], length 1448
> 21:09:13.495356 IP 192.168.13.5.6015 > 192.168.13.44.443: Flags [.],
> ack 1449, win 501, options [nop,nop,TS val 1036282729 ecr 4405365],
> length 0
> 21:09:13.495363 IP 192.168.13.44.443 > 192.168.13.5.6015: Flags [P.],
> seq 1449:2660, ack 518, win 260, options [nop,nop,TS val 4405365 ecr
> 1036282720], length 1211
> 21:09:13.495374 IP 192.168.13.5.6015 > 192.168.13.44.443: Flags [.],
> ack 2660, win 492, options [nop,nop,TS val 1036282729 ecr 4405365],
> length 0
> 21:09:13.497287 IP 192.168.13.5.6015 > 192.168.13.44.443: Flags [P.],
> seq 518:644, ack 2660, win 513, options [nop,nop,TS val 1036282729 ecr
> 4405365], length 126
> 21:09:13.500555 IP 192.168.13.44.443 > 192.168.13.5.6015: Flags [P.],
> seq 2660:2711, ack 644, win 259, options [nop,nop,TS val 4405366 ecr
> 1036282729], length 51
> 21:09:13.500570 IP 192.168.13.5.6015 > 192.168.13.44.443: Flags [.],
> ack 2711, win 512, options [nop,nop,TS val 1036282729 ecr 4405366],
> length 0
> 21:09:13.500799 IP 192.168.13.5.6015 > 192.168.13.44.443: Flags [P.],
> seq 644:943, ack 2711, win 513, options [nop,nop,TS val 1036282729 ecr
> 4405366], length 299
> 21:09:13.502465 IP 192.168.13.44.443 > 192.168.13.5.6015: Flags [.],
> seq 2711:4159, ack 943, win 258, options [nop,nop,TS val 4405366 ecr
> 1036282729], length 1448
> 21:09:13.502483 IP 192.168.13.5.6015 > 192.168.13.44.443: Flags [.],
> ack 4159, win 501, options [nop,nop,TS val 1036282729 ecr 4405366],
> length 0
> 21:09:13.502492 IP 192.168.13.44.443 > 192.168.13.5.6015: Flags [P.],
> seq 4159:4288, ack 943, win 258, options [nop,nop,TS val 4405366 ecr
> 1036282729], length 129
> 21:09:13.502502 IP 192.168.13.5.6015 > 192.168.13.44.443: Flags [.],
> ack 4288, win 500, options [nop,nop,TS val 1036282729 ecr 4405366],
> length 0
> <User/Password is never send>
> 21:11:26.432105 IP 192.168.13.44.443 > 192.168.13.5.6015: Flags [R.],
> seq 4288, ack 943, win 0, length 0
>
> -----------------------------
> 'sess all' before and after the credentials..
>
> [2.4.3-RELEASE][[email protected]_5.itm.local]/root:
> /usr/local/pkg/haproxy/haproxy_socket.sh show sess all
>
> show sess all
>
> 0x8024adf80: [10/Apr/2018:21:09:13.453304] id=7 proto=tcpv4
> source=192.168.8.32:51658
>   flags=0x4ce, conn_retries=3, srv_conn=0x8024f6000, pend_pos=0x0
>   frontend=portal.itemedical.nl2 (id=2 mode=http),
> listener=192.168.8.5:443 (id=1) addr=192.168.8.5:443
>   backend=synergy-intern_http_ipvANY (id=3 mode=http)
> addr=192.168.13.5:6015
>   server=portalServer (id=1) addr=192.168.13.44:443
>   task=0x80242d8c0 (state=0x08 nice=0 calls=4 exp=7s tmask=0x1 age=3s)
>   txn=0x80244b8c0 flags=0x100000 meth=1 status=401 req.st=MSG_TUNNEL
> rsp.st=MSG_TUNNEL waiting=0
>   si[0]=0x8024ae1c8 (state=EST flags=0x08 endp0=NONE:0x8024a8350
> exp=<NEVER>, et=0x000)
>   si[1]=0x8024ae1f0 (state=EST flags=0x118 endp1=NONE:0x8024a8380
> exp=<NEVER>, et=0x000)
>   co0=0x80243c480 ctrl=tcpv4 xprt=SSL mux=PASS data=STRM
> target=LISTENER:0x80243c300
>       flags=0x80203306 fd=10 fd.state=25 fd.cache=0 updt=0 fd.tmask=0x1
>   co1=0x80243c600 ctrl=tcpv4 xprt=SSL mux=PASS data=STRM
> target=SERVER:0x8024f6000
>       flags=0x00203306 fd=11 fd.state=25 fd.cache=0 updt=0 fd.tmask=0x1
>   req=0x8024adf90 (f=0x8848000 an=0x0 pipe=0 tofwd=-1 total=270)
>       an_exp=<NEVER> rex=59m57s wex=<NEVER>
>       buf=0x7d2c18 data=0x7d2c2c o=0 p=0 req.next=0 i=0 size=0
>   res=0x8024adfd0 (f=0xc8048200 an=0x0 pipe=0 tofwd=-1 total=1548)
>       an_exp=<NEVER> rex=59m57s wex=<NEVER>
>       buf=0x802507840 data=0x802507854 o=0 p=0 rsp.next=0 i=0 size=16384
> 0x8024aed80: [10/Apr/2018:21:09:16.365440] id=8 proto=unix_stream
> source=unix:1
>   flags=0x88, conn_retries=0, srv_conn=0x0, pend_pos=0x0
>   frontend=GLOBAL (id=0 mode=tcp), listener=? (id=1) addr=unix:1
>   backend=<NONE> (id=-1 mode=-)
>   server=<NONE> (id=-1)
>   task=0x80242d780 (state=0x02 nice=-64 calls=1 exp=10s tmask=0x1 age=0s)
>   si[0]=0x8024aefc8 (state=EST flags=0x08 endp0=NONE:0x8024a8530
> exp=<NEVER>, et=0x000)
>   si[1]=0x8024aeff0 (state=EST flags=0x4018 endp1=APPCTX:0x80243cc00
> exp=<NEVER>, et=0x000)
>   co0=0x80243ca80 ctrl=unix_stream xprt=RAW mux=PASS data=STRM
> target=LISTENER:0x80243c180
>       flags=0x00203306 fd=12 fd.state=25 fd.cache=0 updt=0 fd.tmask=0x1
>   app1=0x80243cc00 st0=7 st1=0 st2=3 applet=<CLI> tmask=0x1
>   req=0x8024aed90 (f=0x40c08202 an=0x0 pipe=0 tofwd=-1 total=15)
>       an_exp=<NEVER> rex=10s wex=<NEVER>
>       buf=0x8025b3340 data=0x8025b3354 o=0 p=0 req.next=0 i=0 size=16384
>   res=0x8024aedd0 (f=0x80008002 an=0x0 pipe=0 tofwd=-1 total=1398)
>       an_exp=<NEVER> rex=<NEVER> wex=<NEVER>
>       buf=0x802501900 data=0x802501914 o=1398 p=1398 rsp.next=0 i=0
> size=16384
>
> [2.4.3-RELEASE][[email protected]_5.itm.local]/root:
> /usr/local/pkg/haproxy/haproxy_socket.sh show sess all
> show sess all
>
> 0x8024adf80: [10/Apr/2018:21:09:13.453304] id=7 proto=tcpv4
> source=192.168.8.32:51658
>   flags=0x4ce, conn_retries=3, srv_conn=0x8024f6000, pend_pos=0x0
>   frontend=portal.itemedical.nl2 (id=2 mode=http),
> listener=192.168.8.5:443 (id=1) addr=192.168.8.5:443
>   backend=synergy-intern_http_ipvANY (id=3 mode=http)
> addr=192.168.13.5:6015
>   server=portalServer (id=1) addr=192.168.13.44:443
>   task=0x80242d8c0 (state=0x04 nice=0 calls=5 exp=59m43s tmask=0x1
> age=16s)
>   txn=0x80244b8c0 flags=0x100000 meth=1 status=401 req.st=MSG_TUNNEL
> rsp.st=MSG_TUNNEL waiting=0
>   si[0]=0x8024ae1c8 (state=EST flags=0x08 endp0=NONE:0x8024a8350
> exp=<NEVER>, et=0x000)
>   si[1]=0x8024ae1f0 (state=EST flags=0x118 endp1=NONE:0x8024a8380
> exp=<NEVER>, et=0x000)
>   co0=0x80243c480 ctrl=tcpv4 xprt=SSL mux=PASS data=STRM
> target=LISTENER:0x80243c300
>       flags=0x80203306 fd=10 fd.state=25 fd.cache=0 updt=0 fd.tmask=0x1
>   co1=0x80243c600 ctrl=tcpv4 xprt=SSL mux=PASS data=STRM
> target=SERVER:0x8024f6000
>       flags=0x00203306 fd=11 fd.state=25 fd.cache=0 updt=0 fd.tmask=0x1
>   req=0x8024adf90 (f=0x8848000 an=0x0 pipe=0 tofwd=-1 total=270)
>       an_exp=<NEVER> rex=59m43s wex=<NEVER>
>       buf=0x7d2c18 data=0x7d2c2c o=0 p=0 req.next=0 i=0 size=0
>   res=0x8024adfd0 (f=0x88048000 an=0x0 pipe=0 tofwd=-1 total=1548)
>       an_exp=<NEVER> rex=59m43s wex=<NEVER>
>       buf=0x7d2c18 data=0x7d2c2c o=0 p=0 rsp.next=0 i=0 size=0
> 0x8024aed80: [10/Apr/2018:21:09:29.621432] id=9 proto=unix_stream
> source=unix:1
>   flags=0x88, conn_retries=0, srv_conn=0x0, pend_pos=0x0
>   frontend=GLOBAL (id=0 mode=tcp), listener=? (id=1) addr=unix:1
>   backend=<NONE> (id=-1 mode=-)
>   server=<NONE> (id=-1)
>   task=0x80242d780 (state=0x02 nice=-64 calls=1 exp=10s tmask=0x1 age=0s)
>   si[0]=0x8024aefc8 (state=EST flags=0x08 endp0=NONE:0x8024a8530
> exp=<NEVER>, et=0x000)
>   si[1]=0x8024aeff0 (state=EST flags=0x4018 endp1=APPCTX:0x80243cc00
> exp=<NEVER>, et=0x000)
>   co0=0x80243ca80 ctrl=unix_stream xprt=RAW mux=PASS data=STRM
> target=LISTENER:0x80243c180
>       flags=0x00203306 fd=12 fd.state=25 fd.cache=0 updt=0 fd.tmask=0x1
>   app1=0x80243cc00 st0=7 st1=0 st2=3 applet=<CLI> tmask=0x1
>   req=0x8024aed90 (f=0x40c08202 an=0x0 pipe=0 tofwd=-1 total=15)
>       an_exp=<NEVER> rex=10s wex=<NEVER>
>       buf=0x802507840 data=0x802507854 o=0 p=0 req.next=0 i=0 size=16384
>   res=0x8024aedd0 (f=0x80008002 an=0x0 pipe=0 tofwd=-1 total=1393)
>       an_exp=<NEVER> rex=<NEVER> wex=<NEVER>
>       buf=0x802501900 data=0x802501914 o=1393 p=1393 rsp.next=0 i=0
> size=16384
>
Hi Pieter,

On Thu, Apr 12, 2018 at 12:16:21AM +0200, PiBa-NL wrote:
> Hi List / Willy,
>
> Removing the line below 'fixes' my issue with kqueue poller and NTLM
> authentication with option http-tunnel..
> Though i'm sure something is then broken then horribly also (CPU go's
> 100%..). And i'm not sure what the proper fix would be. (ive got to little
> knowledge of what the various flags do and C++ aint a language i normally
> ever look at.. )

Don't worry for this, it's already great that you went that far!

> The 'breaking' commit was this one: http://git.haproxy.org/?p=haproxy-1.8.git;a=commit;h=f839593dd26ec210ba66d74b2a4c2040dd1ed806
>
> Can you take a new look at that piece of code? (as the commit was yours ;) )
> Thanks in advance :).

Thank you very much for pointing the exact line that causes you trouble.
I'm pretty sure that your fix breaks something else and causes some events
to possibly be missed in some cases, but at the moment I'm having a hard
time figuring the details. It could be that an event (stop reading) is
improperly reported through the mux in tunnel mode.

Would you have the ability to try the latest 1.9-dev just by chance ? I'm
interested, since the FDs work a bit differently there. If it happens not
to malfunction it could help us compare the behaviours between the two to
more easily spot the culprit.

Thanks!
Willy
Hi Willy,

Op 12-4-2018 om 1:19 schreef Willy Tarreau:
> Thank you very much for pointing the exact line that causes you trouble.
Well exact line.. probably not the right one. And yes just removing that
line indeed breaks something else. (as expected..)
> Would you have the ability to try the latest 1.9-dev just by chance ?
Yes when i started compiling from source, thats where i started, the
current master branch. Though (sadly) it has the same broken effect.

Ill try and investigate a bit more.. Though if you think of a possible
patch, i'm happy to try anything out ;)

Regards,
PiBa-NL (Pieter)
Hi Willy,

Okay did some more digging..

before and after the 'offending' commit the EV_SET calls are as follows
in attached screenshot, these seem to be the main cause of some things
going wrong.
(I think also without using http-tunnel there is some things that dont
fully work right, but i havn't investigated that to get something
reproducible though, and possibly caused by the same kqueue issue..) The
current issue is 100% reproducible sofar in a (for me) easy way.

I made some changes to haproxy's output, and captured and compared the
output.
The 'START State' is written and also the eo and en variables are added.
to the output.
the READ ADD/DEL WRITE ADD/DEL output is written with fprintf statements
just before the actual EV_SET call..
Left side 'WORKS', right side doesn't..

As you can see there are some different calls to the EV_SET with the
different eo and en states..

I could 'fix' the behavior after the offending commit by adding back a
check for the eo in the second screenshot.. (At least for this 1
particular page request..)

But in current master that 'eo' variable nolonger exists. I'm kinda at
the end of my abilities here.. :)
I hope at least some of it makes sense to you?


And then a screenshot with the states (and polled_mask) as they happen
with 1.9master (of 6-4-2018 19:16) right side, it jump into the first
part of the code "fd totally removed from poll list" more often (they
have the "1" in READ 1 DEL&WRITE 1 DEL):

######## Config used ###########

global
  maxconn 100
defaults
  mode http
listen stats
  bind *:80
  stats enable
  stats admin if TRUE
  stats uri /stats
  stats refresh 5
  option httpchk
  option http-tunnel
  server webserver 192.168.13.44:443 ssl verify none #check check-ssl

Now i hope you can tell what to try next, or have an idea for a possible
patch?

Ill let it rest for a few days :).

Regards,

PiBa-NL (Pieter)
Attachments:
open | download - kekpajdicaepicmp.png (41.8 KB)
open | download - haaeoekaombidkkm.png (25.3 KB)
open | download - cajadnfilamjplkf.png (51.5 KB)
Hi Pieter,

On Fri, Apr 13, 2018 at 12:05:26AM +0200, PiBa-NL wrote:
> Hi Willy,
>
> Okay did some more digging..
>
> before and after the 'offending' commit the EV_SET calls are as follows in
> attached screenshot, these seem to be the main cause of some things going
> wrong.
> (I think also without using http-tunnel there is some things that dont fully
> work right, but i havn't investigated that to get something reproducible
> though, and possibly caused by the same kqueue issue..) The current issue is
> 100% reproducible sofar in a (for me) easy way.

OK that's nice (except for you of course).

> I made some changes to haproxy's output, and captured and compared the
> output.
> The 'START State' is written and also the eo and en variables are added. to
> the output.
> the READ ADD/DEL WRITE ADD/DEL output is written with fprintf statements
> just before the actual EV_SET call..
> Left side 'WORKS', right side doesn't..
>
> As you can see there are some different calls to the EV_SET with the
> different eo and en states..

I'm suspecting we could have something wrong with the polled_mask, maybe
sometimes it's removed too early somewhere, preventing the delete(write)
from being performed, which would explain why it loops.

> I could 'fix' the behavior after the offending commit by adding back a check
> for the eo in the second screenshot.. (At least for this 1 particular page
> request..)
>
> But in current master that 'eo' variable nolonger exists. I'm kinda at the
> end of my abilities here.. :)

Indeed, it took us years to get rid of this duplicate information! And as
you can see apparently doing so possibly revealed that some corner cases
were still relying on it. By the way you must really not try to debug an
old version but stick to the latest fixes. It's too easy to be caught by
another issue that was since fixed. The fact that you could put "eo" there
in your test indicates to me that the test was made on a commit prior to
its removal. I just want to be sure you don't waste your time.

> I hope at least some of it makes sense to you?

In part, yes! I have to sit down in front of it and scratch my head, but
my context switches are rather hard, so I'm very happy that you provide
such detailed information!

I'm seeing two things that could be of interest to test :
- remove the two "if (fdtab[fd].polled_mask & tid_bit)" conditions
to delete the events. It will slightly inflate the list of events
but not that much. If it fixes the problem it means that the
polled_mask is sometimes wrong. Please do that with the updated
master.

- switch to poll() just to see if you have the same so that we can
figure if only the kqueue code triggers the issue. poll() doesn't
rely on polled_mask at all.

Many thanks for your tests.
Willy
Hi Willy,

Op 13-4-2018 om 0:22 schreef Willy Tarreau:
> By the way you must really not try to debug an
> old version but stick to the latest fixes.
Okay testing from now on with current master, just thought it would be
easier to backtrack if i knew what particular new/missing event would
possibly cause it. And it could have been simpler to find a fix just
after the problem was introduced, but it seems it ain't that simple :).
>
> I'm seeing two things that could be of interest to test :
> - remove the two "if (fdtab[fd].polled_mask & tid_bit)" conditions
> to delete the events. It will slightly inflate the list of events
> but not that much. If it fixes the problem it means that the
> polled_mask is sometimes wrong. Please do that with the updated
> master.
Removing the 'if polled_mask' does not fix the issue, in fact that makes
it worse. the "srvrep[0007:0008]: HTTP/1.1 401 Unauthorized" is also not
shown anymore without those checks..
>
> - switch to poll() just to see if you have the same so that we can
> figure if only the kqueue code triggers the issue. poll() doesn't
> rely on polled_mask at all.
Using poll (startup with -dk) the request works properly.
>
> Many thanks for your tests.
> Willy

Regards,

PiBa-NL (Pieter)
Hi Willy,

And a second mail as i just thought of one extra thing you wrote that
maybe i misunderstand or perhaps confused you with a small remark about
cpu usage in my earlier mail (that was a side effect of my other earlier
but totally wrong code change..).
>> I'm suspecting we could have something wrong with the polled_mask, maybe
>> sometimes it's removed too early somewhere, preventing the delete(write)
>> from being performed, which would explain why it loops.
>>
To clarify the issue is not that haproxy uses cpu by looping, the issue
is that haproxy prevents the page from loading in the browser. The 'fix'
on the old version after the commit introducing the issue was to call
the EV_SET write delete *less* often. Or maybe my understanding of what
is does is just wrong :).

Op 13-4-2018 om 0:57 schreef PiBa-NL:
> Hi Willy,
>
> Op 13-4-2018 om 0:22 schreef Willy Tarreau:
>> I'm suspecting we could have something wrong with the polled_mask, maybe
>> sometimes it's removed too early somewhere, preventing the delete(write)
>> from being performed, which would explain why it loops.
>>
>> By the way you must really not try to debug an
>> old version but stick to the latest fixes.
> Okay testing from now on with current master, just thought it would be
> easier to backtrack if i knew what particular new/missing event would
> possibly cause it. And it could have been simpler to find a fix just
> after the problem was introduced, but it seems it ain't that simple :).
>>
>> I'm seeing two things that could be of interest to test :
>>    - remove the two "if (fdtab[fd].polled_mask & tid_bit)" conditions
>>      to delete the events. It will slightly inflate the list of events
>>      but not that much. If it fixes the problem it means that the
>>      polled_mask is sometimes wrong. Please do that with the updated
>>      master.
> Removing the 'if polled_mask' does not fix the issue, in fact that
> makes it worse. the "srvrep[0007:0008]: HTTP/1.1 401 Unauthorized" is
> also not shown anymore without those checks..
>>
>>    - switch to poll() just to see if you have the same so that we can
>>      figure if only the kqueue code triggers the issue. poll() doesn't
>>      rely on polled_mask at all.
> Using poll (startup with -dk) the request works properly.
>>
>> Many thanks for your tests.
>> Willy
>
> Regards,
>
> PiBa-NL (Pieter)
>

Regards,

PiBa-NL (Pieter)
Hi Pieter,

On Fri, Apr 13, 2018 at 01:12:41AM +0200, PiBa-NL wrote:
> To clarify the issue is not that haproxy uses cpu by looping, the issue is
> that haproxy prevents the page from loading in the browser. The 'fix' on the
> old version after the commit introducing the issue was to call the EV_SET
> write delete *less* often. Or maybe my understanding of what is does is just
> wrong :).

Ah, sorry, maybe I'm mixing multiple problems, I understood it was looping.
So if it's indeed missing events, it's different. Maybe we have an issue
with a sequence of enable/disable on the polling, causing it to be disabled
and not re-enabled sometimes. If you can test with "-dk" to disable kqueue
and switch to poll, that will be great because I'll be able to test under
the same conditions. And if it doesn't fail it will tell us it's really
related to the kqueue poller.

Willy
Using poll (startup with -dk) the request works properly.

Op vr 13 apr. 2018 08:29 schreef Willy Tarreau <[email protected]>:

> Hi Pieter,
>
> On Fri, Apr 13, 2018 at 01:12:41AM +0200, PiBa-NL wrote:
> > To clarify the issue is not that haproxy uses cpu by looping, the issue
> is
> > that haproxy prevents the page from loading in the browser. The 'fix' on
> the
> > old version after the commit introducing the issue was to call the EV_SET
> > write delete *less* often. Or maybe my understanding of what is does is
> just
> > wrong :).
>
> Ah, sorry, maybe I'm mixing multiple problems, I understood it was looping.
> So if it's indeed missing events, it's different. Maybe we have an issue
> with a sequence of enable/disable on the polling, causing it to be disabled
> and not re-enabled sometimes. If you can test with "-dk" to disable kqueue
> and switch to poll, that will be great because I'll be able to test under
> the same conditions. And if it doesn't fail it will tell us it's really
> related to the kqueue poller.
>
> Willy
>
On Fri, Apr 13, 2018 at 06:50:50AM +0000, Pi Ba wrote:
> Using poll (startup with -dk) the request works properly.

Thank you Pieter!

Willy
Hi Pieter,

On Fri, Apr 13, 2018 at 06:50:50AM +0000, Pi Ba wrote:
> Using poll (startup with -dk) the request works properly.

After some discussion with Willy, we came with a solution that may fix your
problem with kqueue.
Can you test the attached patch and let me know if it fixes it for you ?

Thanks !

Olivier
From 3c0a505e5f163989239ffb5267ddf7c1ed549fb9 Mon Sep 17 00:00:00 2001
From: Olivier Houchard <[email protected]>
Date: Mon, 16 Apr 2018 13:24:48 +0200
Subject: [PATCH] BUG/MEDIUM: When adding new events, provide an output to get
errors.

When adding new events using kevent(), if there's an error, because we're
trying to delete an event that wasn't there, or because the fd has already
been closed, kevent() will either add an event in the eventlist array if
there's enough room for it, and keep on handling other events, or stop and
return -1.
We want it to process all the events, so give it a large-enough array to
store any error.

This should be backported to 1.8.
---
src/ev_kqueue.c | 18 +++++++++++++++---
1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/src/ev_kqueue.c b/src/ev_kqueue.c
index a103ece9d..faf3ba2c2 100644
--- a/src/ev_kqueue.c
+++ b/src/ev_kqueue.c
@@ -31,6 +31,7 @@
/* private data */
static int kqueue_fd[MAX_THREADS]; // per-thread kqueue_fd
static THREAD_LOCAL struct kevent *kev = NULL;
+static struct kevent *kev_out = NULL; // Trash buffer for kevent() to write the eventlist in

/*
* kqueue() poller
@@ -43,6 +44,8 @@ REGPRM2 static void _do_poll(struct poller *p, int exp)
int updt_idx, en;
int changes = 0;

+ timeout.tv_sec = 0;
+ timeout.tv_nsec = 0;
/* first, scan the update list to find changes */
for (updt_idx = 0; updt_idx < fd_nbupdt; updt_idx++) {
fd = fd_updt[updt_idx];
@@ -82,12 +85,10 @@ REGPRM2 static void _do_poll(struct poller *p, int exp)
}
}
if (changes)
- kevent(kqueue_fd[tid], kev, changes, NULL, 0, NULL);
+ kevent(kqueue_fd[tid], kev, changes, kev_out, changes, &timeout);
fd_nbupdt = 0;

delta_ms = 0;
- timeout.tv_sec = 0;
- timeout.tv_nsec = 0;

if (!exp) {
delta_ms = MAX_DELAY_MS;
@@ -194,6 +195,10 @@ REGPRM1 static int _do_init(struct poller *p)
{
p->private = NULL;

+ kev_out = calloc(1, sizeof(struct kevent) * 2 * global.maxsock);
+ if (!kev_out)
+ goto fail_alloc;
+
kqueue_fd[tid] = kqueue();
if (kqueue_fd[tid] < 0)
goto fail_fd;
@@ -203,6 +208,9 @@ REGPRM1 static int _do_init(struct poller *p)
return 1;

fail_fd:
+ free(kev_out);
+ kev_out = NULL;
+fail_alloc:
p->pref = 0;
return 0;
}
@@ -220,6 +228,10 @@ REGPRM1 static void _do_term(struct poller *p)

p->private = NULL;
p->pref = 0;
+ if (kev_out) {
+ free(kev_out);
+ kev_out = NULL;
+ }
}

/*
--
2.16.3
Hi,

On Mon, Apr 16, 2018 at 03:37:34PM +0200, Olivier Houchard wrote:
> Hi Pieter,
>
> On Fri, Apr 13, 2018 at 06:50:50AM +0000, Pi Ba wrote:
> > Using poll (startup with -dk) the request works properly.
>
> After some discussion with Willy, we came with a solution that may fix your
> problem with kqueue.
> Can you test the attached patch and let me know if it fixes it for you ?
>
> Thanks !
>
> Olivier


Minor variation of the patch, that uses EV_RECEIPT if available, to avoid
scanning needlessly the kqueue.

Regards,

Olivier
From 2229159329ec539c7875943af08c539064dcd76b Mon Sep 17 00:00:00 2001
From: Olivier Houchard <[email protected]>
Date: Mon, 16 Apr 2018 13:24:48 +0200
Subject: [PATCH] BUG/MEDIUM: When adding new events, provide an output to get
errors.

When adding new events using kevent(), if there's an error, because we're
trying to delete an event that wasn't there, or because the fd has already
been closed, kevent() will either add an event in the eventlist array if
there's enough room for it, and keep on handling other events, or stop and
return -1.
We want it to process all the events, so give it a large-enough array to
store any error.

This should be backported to 1.8.
---
src/ev_kqueue.c | 25 +++++++++++++++++++++----
1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/src/ev_kqueue.c b/src/ev_kqueue.c
index a103ece9d..4306c4372 100644
--- a/src/ev_kqueue.c
+++ b/src/ev_kqueue.c
@@ -31,6 +31,7 @@
/* private data */
static int kqueue_fd[MAX_THREADS]; // per-thread kqueue_fd
static THREAD_LOCAL struct kevent *kev = NULL;
+static struct kevent *kev_out = NULL; // Trash buffer for kevent() to write the eventlist in

/*
* kqueue() poller
@@ -43,6 +44,8 @@ REGPRM2 static void _do_poll(struct poller *p, int exp)
int updt_idx, en;
int changes = 0;

+ timeout.tv_sec = 0;
+ timeout.tv_nsec = 0;
/* first, scan the update list to find changes */
for (updt_idx = 0; updt_idx < fd_nbupdt; updt_idx++) {
fd = fd_updt[updt_idx];
@@ -81,13 +84,15 @@ REGPRM2 static void _do_poll(struct poller *p, int exp)
HA_ATOMIC_OR(&fdtab[fd].polled_mask, tid_bit);
}
}
- if (changes)
- kevent(kqueue_fd[tid], kev, changes, NULL, 0, NULL);
+ if (changes) {
+#ifdef EV_RECEIPT
+ kev[0].flags |= EV_RECEIPT;
+#endif
+ kevent(kqueue_fd[tid], kev, changes, kev_out, changes, &timeout);
+ }
fd_nbupdt = 0;

delta_ms = 0;
- timeout.tv_sec = 0;
- timeout.tv_nsec = 0;

if (!exp) {
delta_ms = MAX_DELAY_MS;
@@ -194,6 +199,10 @@ REGPRM1 static int _do_init(struct poller *p)
{
p->private = NULL;

+ kev_out = calloc(1, sizeof(struct kevent) * 2 * global.maxsock);
+ if (!kev_out)
+ goto fail_alloc;
+
kqueue_fd[tid] = kqueue();
if (kqueue_fd[tid] < 0)
goto fail_fd;
@@ -203,6 +212,9 @@ REGPRM1 static int _do_init(struct poller *p)
return 1;

fail_fd:
+ free(kev_out);
+ kev_out = NULL;
+fail_alloc:
p->pref = 0;
return 0;
}
@@ -220,6 +232,10 @@ REGPRM1 static void _do_term(struct poller *p)

p->private = NULL;
p->pref = 0;
+ if (kev_out) {
+ free(kev_out);
+ kev_out = NULL;
+ }
}

/*
@@ -250,6 +266,7 @@ REGPRM1 static int _do_fork(struct poller *p)
return 1;
}

+
/*
* It is a constructor, which means that it will automatically be called before
* main(). This is GCC-specific but it works at least since 2.95.
--
2.14.3
Hi Olivier,

Op 16-4-2018 om 17:09 schreef Olivier Houchard:
> After some discussion with Willy, we came with a solution that may fix your
> problem with kqueue.
> Can you test the attached patch and let me know if it fixes it for you ?
>
> Minor variation of the patch, that uses EV_RECEIPT if available, to avoid
> scanning needlessly the kqueue.
>
> Regards,
>
> Olivier

Thanks the patch solves the issue i experienced at least for the
testcase that i had. (And doesn't seem to cause obvious new issues that
i could quickly spot..) Both with and without EV_RECEIPT on kev[0] it
seems to work the same for my testcase..

Just a few thoughts though:
Now only the first kev[0] gets the EV_RECEIPT flag, shouldn't it be
added to all items in the array? Now sometimes 3 changes are send and
only 2 'results' are reported back. If i read right the EV_RECEIPT
should 'force' a result for each change send. Also is there a reason you
put it inside a '#ifdef' ? It seems to me a hard requirement to not read
any possible pending events when sending the list of updated filters at
that moment.?. Or perhaps its possible to call kevent only once? Both
sending changes, and receiving new events in 1 big go and without the
RECEIPT flag?

There are now more 'changes' send than required that need to be
disregarded with a 'error flag' by kqueue
Doesn't that (slightly) affect performance? Or would checking a bitmask
beforehand not be cheaper than what kevent itself needs to do to ignore
an item and 'error report' some of the changes.?. I've not tried to
measure this, but technically i think there will be a few more cpu
operations needed overall this way.?.

Regards,

PiBa-NL (Pieter)
Hi Pieter,

On Mon, Apr 16, 2018 at 10:41:48PM +0200, PiBa-NL wrote:
> Hi Olivier,
>
> Op 16-4-2018 om 17:09 schreef Olivier Houchard:
> > After some discussion with Willy, we came with a solution that may fix your
> > problem with kqueue.
> > Can you test the attached patch and let me know if it fixes it for you ?
> >
> > Minor variation of the patch, that uses EV_RECEIPT if available, to avoid
> > scanning needlessly the kqueue.
> >
> > Regards,
> >
> > Olivier
>
> Thanks the patch solves the issue i experienced at least for the testcase
> that i had. (And doesn't seem to cause obvious new issues that i could
> quickly spot..) Both with and without EV_RECEIPT on kev[0] it seems to work
> the same for my testcase..
>
> Just a few thoughts though:
> Now only the first kev[0] gets the EV_RECEIPT flag, shouldn't it be added to
> all items in the array? Now sometimes 3 changes are send and only 2
> 'results' are reported back. If i read right the EV_RECEIPT should 'force' a
> result for each change send. Also is there a reason you put it inside a
> '#ifdef' ? It seems to me a hard requirement to not read any possible
> pending events when sending the list of updated filters at that moment.?. Or
> perhaps its possible to call kevent only once? Both sending changes, and
> receiving new events in 1 big go and without the RECEIPT flag?
>

We only need one to prevent kevent() from trying to scanning the kqueue, so
only setting kev[0] should be enough. It's inside an #ifdef because
EV_RECEIPT was only implemented recently in OpenBSD, so all users may not have
it.
My hope is, if EV_RECEIPT is not set, we will indeed read even needlessly,
but we will read them again at the next kevent() call. If it does not, and
events are lost, then we'll have to add an extra entry that generates an error.
We considered using only one kevent() call for both adding events, and reading
them, but it makes it difficult to respect maxpollevents, because we'd need
enough room for all errors and reading the new events.

> There are now more 'changes' send than required that need to be disregarded
> with a 'error flag' by kqueue
> Doesn't that (slightly) affect performance? Or would checking a bitmask
> beforehand not be cheaper than what kevent itself needs to do to ignore an
> item and 'error report' some of the changes.?. I've not tried to measure
> this, but technically i think there will be a few more cpu operations needed
> overall this way.?.

It should be fine. What kevent() does is :
foreach (change) {
ret = process_change();
if (ret != 0 || flags & EV_RECEIPT)
nerrors++;
}
if (nerrors > 0)
return nerrors;
return scan_kqueue();

So it should virtually have no impact on performances.

Regards,

Olivier
Hi again,

On Tue, Apr 17, 2018 at 01:07:49PM +0200, Olivier Houchard wrote:
[...]
> We only need one to prevent kevent() from trying to scanning the kqueue, so
> only setting kev[0] should be enough. It's inside an #ifdef because
> EV_RECEIPT was only implemented recently in OpenBSD, so all users may not have
> it.
> My hope is, if EV_RECEIPT is not set, we will indeed read even needlessly,
> but we will read them again at the next kevent() call. If it does not, and
> events are lost, then we'll have to add an extra entry that generates an error.

After talking with Willy, here is an updated patch that does that.
That way, the day we'll want to use EV_ONESHOT, we'll be ready, and won't
miss any event.

Regards,

Olivier

From 56701f85b3311657ea6915df94ec75ac7a83eb54 Mon Sep 17 00:00:00 2001
From: Olivier Houchard <[email protected]>
Date: Mon, 16 Apr 2018 13:24:48 +0200
Subject: [PATCH] BUG/MEDIUM: kqueue: When adding new events, provide an output
to get errors.

When adding new events using kevent(), if there's an error, because we're
trying to delete an event that wasn't there, or because the fd has already
been closed, kevent() will either add an event in the eventlist array if
there's enough room for it, and keep on handling other events, or stop and
return -1.
We want it to process all the events, so give it a large-enough array to
store any error.

This should be backported to 1.8.
---
src/ev_kqueue.c | 43 +++++++++++++++++++++++++++++++++++++------
1 file changed, 37 insertions(+), 6 deletions(-)

diff --git a/src/ev_kqueue.c b/src/ev_kqueue.c
index a103ece9d..ebfd5d210 100644
--- a/src/ev_kqueue.c
+++ b/src/ev_kqueue.c
@@ -31,6 +31,7 @@
/* private data */
static int kqueue_fd[MAX_THREADS]; // per-thread kqueue_fd
static THREAD_LOCAL struct kevent *kev = NULL;
+static struct kevent *kev_out = NULL; // Trash buffer for kevent() to write the eventlist in

/*
* kqueue() poller
@@ -43,6 +44,8 @@ REGPRM2 static void _do_poll(struct poller *p, int exp)
int updt_idx, en;
int changes = 0;

+ timeout.tv_sec = 0;
+ timeout.tv_nsec = 0;
/* first, scan the update list to find changes */
for (updt_idx = 0; updt_idx < fd_nbupdt; updt_idx++) {
fd = fd_updt[updt_idx];
@@ -81,13 +84,21 @@ REGPRM2 static void _do_poll(struct poller *p, int exp)
HA_ATOMIC_OR(&fdtab[fd].polled_mask, tid_bit);
}
}
- if (changes)
- kevent(kqueue_fd[tid], kev, changes, NULL, 0, NULL);
+ if (changes) {
+#ifdef EV_RECEIPT
+ kev[0].flags |= EV_RECEIPT;
+#else
+ /* If EV_RECEIPT isn't defined, just add an invalid entry,
+ * so that we get an error and kevent() stops before scanning
+ * the kqueue.
+ */
+ EV_SET(&kev[changes++], -1, EVFILT_WRITE, EV_DELETE, 0, 0, NULL);
+#endif
+ kevent(kqueue_fd[tid], kev, changes, kev_out, changes, &timeout);
+ }
fd_nbupdt = 0;

delta_ms = 0;
- timeout.tv_sec = 0;
- timeout.tv_nsec = 0;

if (!exp) {
delta_ms = MAX_DELAY_MS;
@@ -150,8 +161,12 @@ static int init_kqueue_per_thread()
{
int fd;

- /* we can have up to two events per fd (*/
- kev = calloc(1, sizeof(struct kevent) * 2 * global.maxsock);
+ /* we can have up to two events per fd, so allocate enough to store
+ * 2*fd event, and an extra one, in case EV_RECEIPT isn't defined,
+ * so that we can add an invalid entry and get an error, to avoid
+ * scanning the kqueue uselessly.
+ */
+ kev = calloc(1, sizeof(struct kevent) * (2 * global.maxsock + 1));
if (kev == NULL)
goto fail_alloc;

@@ -194,6 +209,15 @@ REGPRM1 static int _do_init(struct poller *p)
{
p->private = NULL;

+ /* we can have up to two events per fd, so allocate enough to store
+ * 2*fd event, and an extra one, in case EV_RECEIPT isn't defined,
+ * so that we can add an invalid entry and get an error, to avoid
+ * scanning the kqueue uselessly.
+ */
+ kev_out = calloc(1, sizeof(struct kevent) * (2 * global.maxsock + 1));
+ if (!kev_out)
+ goto fail_alloc;
+
kqueue_fd[tid] = kqueue();
if (kqueue_fd[tid] < 0)
goto fail_fd;
@@ -203,6 +227,9 @@ REGPRM1 static int _do_init(struct poller *p)
return 1;

fail_fd:
+ free(kev_out);
+ kev_out = NULL;
+fail_alloc:
p->pref = 0;
return 0;
}
@@ -220,6 +247,10 @@ REGPRM1 static void _do_term(struct poller *p)

p->private = NULL;
p->pref = 0;
+ if (kev_out) {
+ free(kev_out);
+ kev_out = NULL;
+ }
}

/*
--
2.14.3
On Tue, Apr 17, 2018 at 04:33:07PM +0200, Olivier Houchard wrote:
> After talking with Willy, here is an updated patch that does that.
> That way, the day we'll want to use EV_ONESHOT, we'll be ready, and won't
> miss any event.

Now merged, thanks guys!
Willy
Op 17-4-2018 om 17:46 schreef Willy Tarreau:
> On Tue, Apr 17, 2018 at 04:33:07PM +0200, Olivier Houchard wrote:
>> After talking with Willy, here is an updated patch that does that.
>> That way, the day we'll want to use EV_ONESHOT, we'll be ready, and won't
>> miss any event.
> Now merged, thanks guys!
> Willy

Thanks!
Works as expected with current master.

Do i dare ask a estimate when 1.8.8 might be released? (oh just did. ;) )
O well, i guess ill run production with nokqueue for a little while,
that works alright for me sofar.

Regards,
PiBa-NL (Pieter)
On Tue, Apr 17, 2018 at 11:47:18PM +0200, PiBa-NL wrote:
> Op 17-4-2018 om 17:46 schreef Willy Tarreau:
> > On Tue, Apr 17, 2018 at 04:33:07PM +0200, Olivier Houchard wrote:
> > > After talking with Willy, here is an updated patch that does that.
> > > That way, the day we'll want to use EV_ONESHOT, we'll be ready, and won't
> > > miss any event.
> > Now merged, thanks guys!
> > Willy
>
> Thanks!
> Works as expected with current master.
>
> Do i dare ask a estimate when 1.8.8 might be released? (oh just did. ;) )

Probably tomorrow evening, with another fix we're working on.

> O well, i guess ill run production with nokqueue for a little while, that
> works alright for me sofar.

Or keep running on 1.8.7 with this patch ;-)

Willy
Sorry, only registered users may post in this forum.

Click here to login