Welcome! Log In Create A New Profile

Advanced

Issue 106 in memcached: binary protocol parsing can cause memcached server lockup

Posted by Anonymous User 
Status: New
Owner: ----
Labels: Type-Defect Priority-Medium

New issue 106 by mar...@hyves.nl: binary protocol parsing can cause
memcached server lockup
http://code.google.com/p/memcached/issues/detail?id=106

What steps will reproduce the problem?
1. run a memcached server on localhost port 10000
2. compile and run the attached c code
3. change the len from 37 to 38 on line 69, recompile and run the code
again.

What is the expected output? What do you see instead?

use top and watch the cpu stats for step 2 and 3. Even after stopping the
programs from step 3 the memcached server is using 100% cpu. For step 2 and
3 I expect for the load to drop to 0 after stopping the program.

What version of the product are you using? On what operating system?
memcached v1.4.3 on linux 2.6 compiled libevent using epoll.

Please provide any additional information below.

The attached code currently be only compiled under linux 2.6 with epoll
support.

Please note that by changing the length of 37 to 38 bytes I broke the
memcached binary spec intentionally. My point is that memcached should have
better protocol handling to protect itself against faulty packets.

Attachments:
test.c 1.9 KB

--
You received this message because you are listed in the owner
or CC fields of this issue, or because you starred this issue.
You may adjust your issue notification preferences at:
http://code.google.com/hosting/settings
Comment #1 on issue 106 by pi3orama: binary protocol parsing can cause
memcached server lockup
http://code.google.com/p/memcached/issues/detail?id=106

The root cause of this issue is try_read_udp() never reset c->rbytes.

The processing of conn_nread minus c->rbytes by c->rlbytes, so if extra
bytes exists, they will be 'left' in c->rbytes. However, each time when
recvfrom()s UDP socket, the whole read buffer is overwritten, so 'c->rbytes
+= res' sets c->rbytes incorrectly.

After the binary 'GET' command fail, the state of connection is transferred
to 'conn_new_cmd' in write_bin_error(), then 'conn_parse_cmd' in
reset_cmd_handler(). When c->rbytes less than sizeof(c->binary_header) (24
bytes), the server believes that there's no enough bytes to hold binary
header so the processing continues normally. However, in the processing of
attached test program, c->rbytes increases by 1 each time a request is
completed. Therefore, after 24 requests, server (wrongly) thinks there's an
extra binary header, and it finally gets wrong magic in try_read_command().

The dead lock is caused by the error processing code. If the UDP connection
is in conn_closing state, it never get closed but only calls
conn_cleanup(). Therefore, when there are incoming packets in the udp
socket, they will never be consumed, so epoll_wait() (in libevent) triggers
input events over and over again for a closing connection.


To set c->rbytes correctly:
--- ./memcached.ori.c 2010-08-23 22:59:12.000000000 +0800
+++ ./memcached.c 2010-08-23 22:59:17.000000000 +0800
@@ -3223,7 +3223,7 @@
res -= 8;
memmove(c->rbuf, c->rbuf + 8, res);

- c->rbytes += res;
+ c->rbytes = res;
c->rcurr = c->rbuf;
return READ_DATA_RECEIVED;
}


To consume incoming packets when cleanup:

--- ./memcached.ori.c 2010-08-23 22:59:12.000000000 +0800
+++ ./memcached.c 2010-08-23 23:27:38.000000000 +0800
@@ -471,6 +471,9 @@
sasl_dispose(&c->sasl_conn);
c->sasl_conn = NULL;
}
+
+ /* consume incoming packets */
+ recvfrom(c->sfd, NULL, 0, 0, NULL, NULL);
}

/*
Comment #2 on issue 106 by pi3orama: binary protocol parsing can cause
memcached server lockup
http://code.google.com/p/memcached/issues/detail?id=106

The issue is studied and solved under the help of Snitchaser:
http://gitorious.org/snitchaser
Sorry, only registered users may post in this forum.

Click here to login