Welcome! Log In Create A New Profile

Advanced

Linux kernel crash with haproxy 1.5-dev11 and ipv6 listener

Posted by Stephen Balukoff 
Stephen Balukoff
Linux kernel crash with haproxy 1.5-dev11 and ipv6 listener
July 19, 2012 08:20PM
Hello y'all!

So, I'm attempting to use haproxy to load balance an IPv6 listener
with an IPv6 backend. The interesting problem I'm running into is that
I'm able to reliably crash the linux kernel I'm using. Has anyone
else run into a similar issue? (Obviously, this feels like a kernel
bug to me-- a user space program ought not to be able to crash the
kernel. But still, I do wonder if there's something I'm doing which
is particularly wrong in this case.)

The kernel is the Scientific Linux port of the latest RHEL 6.2 kernel:
2.6.32-279.1.1.el6.x86_64
haproxy version I'm experimenting with is 1.5-dev11, built as an rpm
using the haproxy.spec file included with the source.

I've tried this on other 2.6.32 kernels with similar results.

Here's the pertinent portion of the crash log:

BUG: unable to handle kernel paging request at ffffc90737275ab8
IP: [<ffffffffa03108f8>] inet6_csk_search_req+0x48/0x130 [ipv6]
PGD 23feb8067 PUD 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map
CPU 3
Modules linked in: ip6table_filter ip6_tables xt_comment
iptable_filter ip_tables bonding 8021q garp stp llc ipv6 xfs exportfs
microcode serio_raw sg i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support
ioatdma i7core_edac edac_core igb dca ext4 mbcache jbd2 sr_mod cdrom
sd_mod crc_t10dif ahci 3w_sas dm_mirror dm_region_hash dm_log dm_mod
[last unloaded: scsi_wait_scan]

Pid: 0, comm: swapper Not tainted 2.6.32-279.1.1.el6.x86_64 #1
Supermicro X8DTU/X8DTU
RIP: 0010:[<ffffffffa03108f8>] [<ffffffffa03108f8>]
inet6_csk_search_req+0x48/0x130 [ipv6]
RSP: 0018:ffff88002f663a30 EFLAGS: 00010206
RAX: 00000000e4af6154 RBX: ffff88023774e838 RCX: 00000000ffffffff
RDX: 00000000e4af6156 RSI: 00000000d675f11e RDI: 00000000f7a705c2
RBP: ffff88002f663a70 R08: 0000000062ea86fc R09: 00000000ff7e638b
R10: ffff8802364cd050 R11: 0000000000000000 R12: ffff88023774e848
R13: 0000000000002891 R14: 0000000000000004 R15: ffffc90011ac5000
FS: 0000000000000000(0000) GS:ffff88002f660000(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: ffffc90737275ab8 CR3: 0000000239027000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffff88023afc6000, task ffff88023afc2aa0)
Stack:
ffff88002f663a40 ffff88002f663ab8 ffff88002f663a70 ffff880237750780
<d> ffff8802364cd040 0000000000000000 ffff88023774e858 ffff88023774e830
<d> ffff88002f663b10 ffffffffa0309efd ffffffffa010e060 ffff880239251760
Call Trace:
<IRQ>
[<ffffffffa0309efd>] tcp_v6_do_rcv+0x38d/0x5b0 [ipv6]
[<ffffffffa02f2080>] ? ip6_pol_route_input+0x0/0x20 [ipv6]
[<ffffffffa030bbe0>] tcp_v6_rcv+0x560/0x870 [ipv6]
[<ffffffff814665f9>] ? nf_iterate+0x69/0xb0
[<ffffffffa02e67fa>] ip6_input_finish+0x16a/0x410 [ipv6]
[<ffffffffa02e6af8>] ip6_input+0x58/0x60 [ipv6]
[<ffffffffa02e621f>] ip6_rcv_finish+0x3f/0x50 [ipv6]
[<ffffffffa02e65b8>] ipv6_rcv+0x388/0x460 [ipv6]
[<ffffffff8143a7cb>] __netif_receive_skb+0x49b/0x6f0
[<ffffffff8143ca48>] netif_receive_skb+0x58/0x60
[<ffffffff8143cb50>] napi_skb_finish+0x50/0x70
[<ffffffff8143f089>] napi_gro_receive+0x39/0x50
[<ffffffffa01223b4>] igb_poll+0x864/0xb00 [igb]
[<ffffffff81060456>] ? rebalance_domains+0x1a6/0x5a0
[<ffffffff81096112>] ? enqueue_hrtimer+0x82/0xd0
[<ffffffff8143f1a3>] net_rx_action+0x103/0x2f0
[<ffffffff81073ec1>] __do_softirq+0xc1/0x1e0
[<ffffffff810db810>] ? handle_IRQ_event+0x60/0x170
[<ffffffff8100c24c>] call_softirq+0x1c/0x30
[<ffffffff8100de85>] do_softirq+0x65/0xa0
[<ffffffff81073ca5>] irq_exit+0x85/0x90
[<ffffffff81505b05>] do_IRQ+0x75/0xf0
[<ffffffff8100ba53>] ret_from_intr+0x0/0x11
<EOI>
[<ffffffff812cd8de>] ? intel_idle+0xde/0x170
[<ffffffff812cd8c1>] ? intel_idle+0xc1/0x170
[<ffffffff81407637>] cpuidle_idle_call+0xa7/0x140
[<ffffffff81009e06>] cpu_idle+0xb6/0x110
[<ffffffff814f6cef>] start_secondary+0x22a/0x26d
Code: 08 03 00 00 48 89 cb 41 89 d5 48 89 df 4d 89 c4 41 0f b7 f5 45
89 ce 41 0f b7 4f 14 41 8b 57 10 e8 6e fa ff ff 89 c2 48 83 c2 02 <49>
8b 44 d7 08 48 85 c0 0f 84 86 00 00 00 4d 8d 7c d7 08 eb 09
RIP [<ffffffffa03108f8>] inet6_csk_search_req+0x48/0x130 [ipv6]
RSP <ffff88002f663a30>
CR2: ffffc90737275ab8


And here's the config I'm using:
# Config file for cust44052_http_80_lbs6443

global
log /dev/haproxy-log news
maxconn 50000
user haproxy
group haproxy
daemon
pidfile /var/run/haproxy/haproxy.cust44052_http_80_lbs6443.pid
stats socket /var/lib/haproxy/stats.cust44052_http_80_lbs6443.sock
nosplice

defaults
log global
mode http
option httplog
option dontlognull
option dontlog-normal
retries 3
option redispatch
maxconn 50000
contimeout 5000
clitimeout 50000
srvtimeout 50000
option forwardfor
errorfile 400 /etc/haproxy/errors/400.http
errorfile 403 /etc/haproxy/errors/403.http
errorfile 408 /etc/haproxy/errors/408.http
errorfile 500 /etc/haproxy/errors/500.http
errorfile 502 /etc/haproxy/errors/502.http
errorfile 503 /etc/haproxy/errors/503.http
errorfile 504 /etc/haproxy/errors/504.http
stats enable
stats hide-version
stats uri /bbg_haproxy_stats
stats realm BBG\ Haproxy\ Statistics
stats auth cust44052:i5gCkukscTV7pdpVR


balance roundrobin
option httpclose # disable keep-alive
#source

frontend cust44052_http_80_lbs6443
bind 2607:f700:8001:1b:1234:5678:abcd:beef:80
bind 199.91.168.52:80
acl site_dead nbsrv(default) lt 1
monitor fail if site_dead
default_backend default

backend default
option httpchk GET / HTTP/1.1\r\nHost:\ localhost

server will.c44052 2607:f700:8000:12e:dead:beef:1:449:80 check inter
5000 rise 2 fall 5


I can reliably trigger the crash just by trying to make a TCP
connection to 2607:f700:8001:1b:1234:5678:abcd:beef on port 80. This
does not happen when connecting to the IPv4 bind address above (and in
fact, I get the web response I would expect from the back-end).

Thanks,
Stephen

--
Stephen Balukoff
Blue Box Group, LLC
(800)613-4305 x807
Hi Stephen,

On Thu, Jul 19, 2012 at 11:16:18AM -0700, Stephen Balukoff wrote:
> Hello y'all!
>
> So, I'm attempting to use haproxy to load balance an IPv6 listener
> with an IPv6 backend. The interesting problem I'm running into is that
> I'm able to reliably crash the linux kernel I'm using. Has anyone
> else run into a similar issue? (Obviously, this feels like a kernel
> bug to me-- a user space program ought not to be able to crash the
> kernel. But still, I do wonder if there's something I'm doing which
> is particularly wrong in this case.)
>
> The kernel is the Scientific Linux port of the latest RHEL 6.2 kernel:
> 2.6.32-279.1.1.el6.x86_64
> haproxy version I'm experimenting with is 1.5-dev11, built as an rpm
> using the haproxy.spec file included with the source.
>
> I've tried this on other 2.6.32 kernels with similar results.

I'm not aware of this, this is quite concerning. Have you tried with a
mainline 2.6.32.x kernel (eg: 2.6.32.59) ?

There certainly is a bug in the kernel, but we don't know if it's only
in RHEL's specific code or in mainline. That's important to know where
it should be reported, because it clearly needs to be reported.

We've run tests on 2.6.32.59 at Exceliance without even getting such an
issue, so it might be in RH's kernel. It may also depend on a config
setting.

Is it easy to reproduce or do you need to wait for several days of IPv6
traffic ?

Thanks,
Willy
Hi Willy,

I'm compiling a main-line 2.6.32.27 kernel and will let you know if
I'm able to crash this as easily as the RHEL-6 kernel. (I couldn't
find a 2.6.32.59 kernel-- it doesn't appear to be on ftp.kernel.org.)

And in any case, it's extremely reproducible: All I have to do is
attempt to make an IPv6 TCP connection to an IPv6 haproxy listener and
the kernel crashes instantly (ie. reproducible with essentially one
packet).

Also note: I've verified that I get no crashing with a main-line 3.4.5 kernel.

Thanks,
Stephen

On Sat, Jul 21, 2012 at 2:58 AM, Willy Tarreau <[email protected]> wrote:
> Hi Stephen,
>
> On Thu, Jul 19, 2012 at 11:16:18AM -0700, Stephen Balukoff wrote:
>> Hello y'all!
>>
>> So, I'm attempting to use haproxy to load balance an IPv6 listener
>> with an IPv6 backend. The interesting problem I'm running into is that
>> I'm able to reliably crash the linux kernel I'm using. Has anyone
>> else run into a similar issue? (Obviously, this feels like a kernel
>> bug to me-- a user space program ought not to be able to crash the
>> kernel. But still, I do wonder if there's something I'm doing which
>> is particularly wrong in this case.)
>>
>> The kernel is the Scientific Linux port of the latest RHEL 6.2 kernel:
>> 2.6.32-279.1.1.el6.x86_64
>> haproxy version I'm experimenting with is 1.5-dev11, built as an rpm
>> using the haproxy.spec file included with the source.
>>
>> I've tried this on other 2.6.32 kernels with similar results.
>
> I'm not aware of this, this is quite concerning. Have you tried with a
> mainline 2.6.32.x kernel (eg: 2.6.32.59) ?
>
> There certainly is a bug in the kernel, but we don't know if it's only
> in RHEL's specific code or in mainline. That's important to know where
> it should be reported, because it clearly needs to be reported.
>
> We've run tests on 2.6.32.59 at Exceliance without even getting such an
> issue, so it might be in RH's kernel. It may also depend on a config
> setting.
>
> Is it easy to reproduce or do you need to wait for several days of IPv6
> traffic ?
>
> Thanks,
> Willy
>



--
Stephen Balukoff
Blue Box Group, LLC
(800)613-4305 x807
David du Colombier
Re: Linux kernel crash with haproxy 1.5-dev11 and ipv6 listener
July 25, 2012 10:40AM
> I'm compiling a main-line 2.6.32.27 kernel and will let you know if
> I'm able to crash this as easily as the RHEL-6 kernel. (I couldn't
> find a 2.6.32.59 kernel-- it doesn't appear to be on ftp.kernel.org.)

You can find the 2.6.32.59 kernel in the following directory:

ftp://ftp.kernel.org/pub/linux/kernel/v2.6/longterm/v2.6.32/

--
David du Colombier
Aah! Excellent! Didn't know how they were structuring the directories there.

I'm now compiling a 2.6.32.59 kernel for testing, eh.

Stephen

On Wed, Jul 25, 2012 at 1:29 AM, David du Colombier
<[email protected]> wrote:
>> I'm compiling a main-line 2.6.32.27 kernel and will let you know if
>> I'm able to crash this as easily as the RHEL-6 kernel. (I couldn't
>> find a 2.6.32.59 kernel-- it doesn't appear to be on ftp.kernel.org.)
>
> You can find the 2.6.32.59 kernel in the following directory:
>
> ftp://ftp.kernel.org/pub/linux/kernel/v2.6/longterm/v2.6.32/
>
> --
> David du Colombier
>



--
Stephen Balukoff
Blue Box Group, LLC
(800)613-4305 x807
On Tue, Jul 24, 2012 at 11:55:17AM -0700, Stephen Balukoff wrote:
> Hi Willy,
>
> I'm compiling a main-line 2.6.32.27 kernel and will let you know if
> I'm able to crash this as easily as the RHEL-6 kernel. (I couldn't
> find a 2.6.32.59 kernel-- it doesn't appear to be on ftp.kernel.org.)

You have it here :

http://www.kernel.org/pub/linux/kernel/v2.6/longterm/v2.6.32/

I know that the move to the longterm subdir causes a real mess to users,
but I didn't choose it :-/

It's really important to be on the updated version, because if the issue
was fixed between .27 and .59, there's no point trying to find whether you
can reproduce it or not. In short, if .59 is OK and RHEL is KO, then you
can simply report it to them and they'll be able to figure out what patch
in their kernel causes this. If it's also in mainline, we need to fix it
and the fix will naturally find its way to RHEL's kernel.

> And in any case, it's extremely reproducible: All I have to do is
> attempt to make an IPv6 TCP connection to an IPv6 haproxy listener and
> the kernel crashes instantly (ie. reproducible with essentially one
> packet).

Impressed !

> Also note: I've verified that I get no crashing with a main-line 3.4.5 kernel.

OK that's great!

Thanks,
Willy
Sorry, only registered users may post in this forum.

Click here to login