Welcome! Log In Create A New Profile

Advanced

Is memcache add() atomic on a multithreaded memcached?

Posted by elSchrom 
elSchrom
Is memcache add() atomic on a multithreaded memcached?
October 13, 2010 06:40PM
Hi everyone,

we have the following situation: due to massive simultaneous inserts
in mysql on possibly identical primary keys, we use the atomic
memcache add() as a semaphore. In a few cases we observed the
behaviour, that two simultaneous add() using the same key from
different clients both returned true (due to consistent hashing the
key has to be on the same machine).

Is it now possible, that the multithreaded memcached does return true
on two concurrent add() on the same key, if the requests are handled
by two different threads on the same machine?

Any information on this would be appreciated.

Kind regard,

Jerome
> Hi everyone,
>
> we have the following situation: due to massive simultaneous inserts
> in mysql on possibly identical primary keys, we use the atomic
> memcache add() as a semaphore. In a few cases we observed the
> behaviour, that two simultaneous add() using the same key from
> different clients both returned true (due to consistent hashing the
> key has to be on the same machine).
>
> Is it now possible, that the multithreaded memcached does return true
> on two concurrent add() on the same key, if the requests are handled
> by two different threads on the same machine?

It should not be possible, no. Be sure you've disabled the client
"failover" code.
Yeah, we also have used this as a sort of crude locking mechanism on a site
under fairly heavy load and have never seen any sort of inconsistency-- as
dormando said, I'd make sure your configuration is correct. Debug and make
sure that they're both indeed setting it on the same server. Or, if that's
not possible, whip up a small script that iterates through all of your
servers and see if the key exists on multiple servers.

On Wed, Oct 13, 2010 at 1:47 PM, dormando <[email protected]> wrote:

> > Hi everyone,
> >
> > we have the following situation: due to massive simultaneous inserts
> > in mysql on possibly identical primary keys, we use the atomic
> > memcache add() as a semaphore. In a few cases we observed the
> > behaviour, that two simultaneous add() using the same key from
> > different clients both returned true (due to consistent hashing the
> > key has to be on the same machine).
> >
> > Is it now possible, that the multithreaded memcached does return true
> > on two concurrent add() on the same key, if the requests are handled
> > by two different threads on the same machine?
>
> It should not be possible, no. Be sure you've disabled the client
> "failover" code.
>



--
awl
moses wejuli
Re: Is memcache add() atomic on a multithreaded memcached?
October 14, 2010 12:20AM
.... or you couldd use a concatenation of ur server ID/timestamp/query/unique
client variable(s)/session.... etc.. (all hashed) as part of your (hashed)
key... there's countless ways to make ur key unique... even in ur
situation!!!

On 13 October 2010 19:11, Adam Lee <[email protected]> wrote:

> Yeah, we also have used this as a sort of crude locking mechanism on a site
> under fairly heavy load and have never seen any sort of inconsistency-- as
> dormando said, I'd make sure your configuration is correct. Debug and make
> sure that they're both indeed setting it on the same server. Or, if that's
> not possible, whip up a small script that iterates through all of your
> servers and see if the key exists on multiple servers.
>
> On Wed, Oct 13, 2010 at 1:47 PM, dormando <[email protected]> wrote:
>
>> > Hi everyone,
>> >
>> > we have the following situation: due to massive simultaneous inserts
>> > in mysql on possibly identical primary keys, we use the atomic
>> > memcache add() as a semaphore. In a few cases we observed the
>> > behaviour, that two simultaneous add() using the same key from
>> > different clients both returned true (due to consistent hashing the
>> > key has to be on the same machine).
>> >
>> > Is it now possible, that the multithreaded memcached does return true
>> > on two concurrent add() on the same key, if the requests are handled
>> > by two different threads on the same machine?
>>
>> It should not be possible, no. Be sure you've disabled the client
>> "failover" code.
>>
>
>
>
> --
> awl
>
Thx for your replies so far. Failover is deactivated in our
configuration. This can not be the reason. I think I have to write a
little bit more
about the circumstances:

our 50+ consistent hashing cluster is very reliable on normal
operations, incr/decr, get, set, multiget, etc. is not a problem. If
we have a problem with keys on wrong servers in the continuum, we
should have more problems, which we currently have not.
The cluster is always under relatively high load (the number of
connections for example is very high due to 160+ webservers in the
front). We are now expecting in a very few cases, that this
locking mechanism does not work. Two different clients try to lock the
with the same object (if you want to prevent multiple inserts in a
database on the same
primary key you have to explicitly set one key valid for all clients
and not a key with unique hashes in it), it works millions of times as
expected (we are generating a large number of user triggered database
inserts (~60/sec.)
with this construct). But a handful of locks does not work and shows
the behaviour described. So now my question is again: is it thinkable
(even if it is very implausible), that
a multithreaded memd does not provide 100% sure atomic add()?

Kind regards,

Jerome
> our 50+ consistent hashing cluster is very reliable on normal
> operations, incr/decr, get, set, multiget, etc. is not a problem. If
> we have a problem with keys on wrong servers in the continuum, we
> should have more problems, which we currently have not.
> The cluster is always under relatively high load (the number of
> connections for example is very high due to 160+ webservers in the
> front). We are now expecting in a very few cases, that this
> locking mechanism does not work. Two different clients try to lock the
> with the same object (if you want to prevent multiple inserts in a
> database on the same
> primary key you have to explicitly set one key valid for all clients
> and not a key with unique hashes in it), it works millions of times as
> expected (we are generating a large number of user triggered database
> inserts (~60/sec.)
> with this construct). But a handful of locks does not work and shows
> the behaviour described. So now my question is again: is it thinkable
> (even if it is very implausible), that
> a multithreaded memd does not provide 100% sure atomic add()?

restart memcached with -t 1 and see if it stops happening. I already said
it's not possible.
On 14 Okt., 10:00, dormando <dorma...@rydia.net> wrote:
> > our 50+ consistent hashing cluster is very reliable on normal
> > operations, incr/decr, get, set, multiget, etc. is not a problem. If
> > we have a problem with keys on wrong servers in the continuum, we
> > should have more problems, which we currently have not.
> > The cluster is always under relatively high load (the number of
> > connections for example is very high due to 160+ webservers in the
> > front). We are now expecting in a very few cases, that this
> > locking mechanism does not work. Two different clients try to lock the
> > with the same object (if you want to prevent multiple inserts in a
> > database on the same
> > primary key you have to explicitly set one key valid for all clients
> > and not a key with unique hashes in it), it works millions of times as
> > expected (we are generating a large number of user triggered database
> > inserts (~60/sec.)
> > with this construct). But a handful of locks does not work and shows
> > the behaviour described. So now my question is again: is it thinkable
> > (even if it is very implausible), that
> > a multithreaded memd does not provide 100% sure atomic add()?
>
> restart memcached with -t 1 and see if it stops happening. I already said
> it's not possible.

Yeah, right. :-) Restarting all memd instances is not an option. Can
you explain, why it is not possible?
>
> Yeah, right. :-) Restarting all memd instances is not an option. Can
> you explain, why it is not possible?

Because we've programmed the commands with the full intent to be atomic.
If it's not, there's a bug... there's an issue with incr/decr that's been
fixed upstream but we've never had a reported issue with add.

I'm not sure what you want to hear. "They're supposed to be atomic, yes."
- that much is in the wiki too.
Dieter Schmidt
Re: Is memcache add() atomic on a multithreaded memcached?
October 14, 2010 11:40AM
For me it sounds like a configuration problem on the webservers or an availability/accessability issue.
If for example all machines are accessable the locking key resides on maschine x. If one of the servers webservers differers in cfg it can happen that the key is added a second time as new somewhere else in the continuum. As result you will have a second insert into your db.

What do you think? Possible?


elSchrom <[email protected]> schrieb:

>
>
>On 14 Okt., 10:00, dormando <dorma...@rydia.net> wrote:
>> > our 50+ consistent hashing cluster is very reliable on normal
>> > operations, incr/decr, get, set, multiget, etc. is not a problem. If
>> > we have a problem with keys on wrong servers in the continuum, we
>> > should have more problems, which we currently have not.
>> > The cluster is always under relatively high load (the number of
>> > connections for example is very high due to 160+ webservers in the
>> > front). We are now expecting in a very few cases, that this
>> > locking mechanism does not work. Two different clients try to lock the
>> > with the same object (if you want to prevent multiple inserts in a
>> > database on the same
>> > primary key you have to explicitly set one key valid for all clients
>> > and not a key with unique hashes in it), it works millions of times as
>> > expected (we are generating a large number of user triggered database
>> > inserts (~60/sec.)
>> > with this construct). But a handful of locks does not work and shows
>> > the behaviour described. So now my question is again: is it thinkable
>> > (even if it is very implausible), that
>> > a multithreaded memd does not provide 100% sure atomic add()?
>>
>> restart memcached with -t 1 and see if it stops happening. I already said
>> it's not possible.
>
>Yeah, right. :-) Restarting all memd instances is not an option. Can
>you explain, why it is not possible?
On 14 Okt., 10:31, dormando <dorma...@rydia.net> wrote:
> > Yeah, right. :-) Restarting all memd instances is not an option. Can
> > you explain, why it is not possible?
>
> Because we've programmed the commands with the full intent to be atomic.
> If it's not, there's a bug... there's an issue with incr/decr that's been
> fixed upstream but we've never had a reported issue with add.
>
> I'm not sure what you want to hear. "They're supposed to be atomic, yes."
> - that much is in the wiki too.

I sure thought, that you designed memd to behave exactly the same with
1 or many threads and it's good to hear, that there is no pending bug
concerning
atomicity of add() on multiple threads. The reason why someone posts
such a think on the mailinglist is to hear, what the opinion of a dev
is who has all the insight. :-)
So please understand my obstinately behaviour.
We are planning to run some tests concerning this behaviour, maybe I
can provide more detail in the future. But it will be hard to find
proof for a bug in this scenario. For that we have to build
a test scenario, with multiple instances trying to make an add() on
the same key on the exact same time on a consistent hashing cluster.
Hi Diez,

On 14 Okt., 11:39, Dieter Schmidt <flatl...@stresstiming.de> wrote:
> For me it sounds like a configuration problem on the webservers or an availability/accessability issue.
> If for example all machines are accessable the locking key resides on maschine x. If one of the servers webservers differers in cfg it can happen that the key is added a second time as new somewhere else in the continuum. As result you will have a second insert into your db.
>
> What do you think? Possible?


Possible for sure, but this should produce more problems like massive
redundant cached items, because some clients have a different type of
continuum. This is most likely not happening. The current failure rate
is smaller 0,0001% and they appear on different frontend-servers. It
feels like a very unlikely thing is happening here due to a massive
number of used add(), with a very rare number of failures.

>
> elSchrom <jerome.p...@googlemail.com> schrieb:
>
>
>
> >On 14 Okt., 10:00, dormando <dorma...@rydia.net> wrote:
> >> > our 50+ consistent hashing cluster is very reliable on normal
> >> > operations, incr/decr, get, set, multiget, etc. is not a problem. If
> >> > we have a problem with keys on wrong servers in the continuum, we
> >> > should have more problems, which we currently have not.
> >> > The cluster is always under relatively high load (the number of
> >> > connections for example is very high due to 160+ webservers in the
> >> > front). We are now expecting in a very few cases, that this
> >> > locking mechanism does not work. Two different clients try to lock the
> >> > with the same object (if you want to prevent multiple inserts in a
> >> > database on the same
> >> > primary key you have to explicitly set one key valid for all clients
> >> > and not a key with unique hashes in it), it works millions of times as
> >> > expected (we are generating a large number of user triggered database
> >> > inserts (~60/sec.)
> >> > with this construct). But a handful of locks does not work and shows
> >> > the behaviour described. So now my question is again: is it thinkable
> >> > (even if it is very implausible), that
> >> > a multithreaded memd does not provide 100% sure atomic add()?
>
> >> restart memcached with -t 1 and see if it stops happening. I already said
> >> it's not possible.
>
> >Yeah, right. :-) Restarting all memd instances is not an option. Can
> >you explain, why it is not possible?
Dieter Schmidt
Re: Is memcache add() atomic on a multithreaded memcached?
October 14, 2010 02:00PM
What happens if the add cmd failes because of an unlikely network error?

elSchrom <[email protected]> schrieb:

>Hi Diez,
>
>On 14 Okt., 11:39, Dieter Schmidt <flatl...@stresstiming.de> wrote:
>> For me it sounds like a configuration problem on the webservers or an availability/accessability issue.
>> If for example all machines are accessable the locking key resides on maschine x. If one of the servers webservers differers in cfg it can happen that the key is added a second time as new somewhere else in the continuum. As result you will have a second insert into your db.
>>
>> What do you think? Possible?
>
>
>Possible for sure, but this should produce more problems like massive
>redundant cached items, because some clients have a different type of
>continuum. This is most likely not happening. The current failure rate
>is smaller 0,0001% and they appear on different frontend-servers. It
>feels like a very unlikely thing is happening here due to a massive
>number of used add(), with a very rare number of failures.
>
>>
>> elSchrom <jerome.p...@googlemail.com> schrieb:
>>
>>
>>
>> >On 14 Okt., 10:00, dormando <dorma...@rydia.net> wrote:
>> >> > our 50+ consistent hashing cluster is very reliable on normal
>> >> > operations, incr/decr, get, set, multiget, etc. is not a problem. If
>> >> > we have a problem with keys on wrong servers in the continuum, we
>> >> > should have more problems, which we currently have not.
>> >> > The cluster is always under relatively high load (the number of
>> >> > connections for example is very high due to 160+ webservers in the
>> >> > front). We are now expecting in a very few cases, that this
>> >> > locking mechanism does not work. Two different clients try to lock the
>> >> > with the same object (if you want to prevent multiple inserts in a
>> >> > database on the same
>> >> > primary key you have to explicitly set one key valid for all clients
>> >> > and not a key with unique hashes in it), it works millions of times as
>> >> > expected (we are generating a large number of user triggered database
>> >> > inserts (~60/sec.)
>> >> > with this construct). But a handful of locks does not work and shows
>> >> > the behaviour described. So now my question is again: is it thinkable
>> >> > (even if it is very implausible), that
>> >> > a multithreaded memd does not provide 100% sure atomic add()?
>>
>> >> restart memcached with -t 1 and see if it stops happening. I already said
>> >> it's not possible.
>>
>> >Yeah, right. :-) Restarting all memd instances is not an option. Can
>> >you explain, why it is not possible?
On 14 Okt., 14:01, Dieter Schmidt <flatl...@stresstiming.de> wrote:
> What happens if the add cmd failes because of an unlikely network error?

The situation is: two different clients are doing an add() with the
same key at the same time. Both are getting true (assuming that this
key has to be on the same machine,
it has to be an threading problem or a bug in add()). This breaks the
atomic behaviour, we are expecting. But we can not prove, that the key
is in that moment on
the same server, because it is highly volatile. It is just
speculation, because if keys are not stored correctly due to
consistent hashing problems, we should expect more problems.

>
> elSchrom <jerome.p...@googlemail.com> schrieb:
>
> >Hi Diez,
>
> >On 14 Okt., 11:39, Dieter Schmidt <flatl...@stresstiming.de> wrote:
> >> For me it sounds like a configuration problem on the webservers or an availability/accessability issue.
> >> If for example all machines are accessable the locking key resides on maschine x. If one of the servers webservers differers in cfg it can happen that the key is added a second time as new somewhere else in the continuum.. As result you will have a second insert into your db.
>
> >> What do you think? Possible?
>
> >Possible for sure, but this should produce more problems like massive
> >redundant cached items, because some clients have a different type of
> >continuum. This is most likely not happening. The current failure rate
> >is smaller 0,0001% and they appear on different frontend-servers. It
> >feels like a very unlikely thing is happening here due to a massive
> >number of used add(), with a very rare number of failures.
>
> >> elSchrom <jerome.p...@googlemail.com> schrieb:
>
> >> >On 14 Okt., 10:00, dormando <dorma...@rydia.net> wrote:
> >> >> > our 50+ consistent hashing cluster is very reliable on normal
> >> >> > operations, incr/decr, get, set, multiget, etc. is not a problem. If
> >> >> > we have a problem with keys on wrong servers in the continuum, we
> >> >> > should have more problems, which we currently have not.
> >> >> > The cluster is always under relatively high load (the number of
> >> >> > connections for example is very high due to 160+ webservers in the
> >> >> > front). We are now expecting in a very few cases, that this
> >> >> > locking mechanism does not work. Two different clients try to lock the
> >> >> > with the same object (if you want to prevent multiple inserts in a
> >> >> > database on the same
> >> >> > primary key you have to explicitly set one key valid for all clients
> >> >> > and not a key with unique hashes in it), it works millions of times as
> >> >> > expected (we are generating a large number of user triggered database
> >> >> > inserts (~60/sec.)
> >> >> > with this construct). But a handful of locks does not work and shows
> >> >> > the behaviour described. So now my question is again: is it thinkable
> >> >> > (even if it is very implausible), that
> >> >> > a multithreaded memd does not provide 100% sure atomic add()?
>
> >> >> restart memcached with -t 1 and see if it stops happening. I already said
> >> >> it's not possible.
>
> >> >Yeah, right. :-) Restarting all memd instances is not an option. Can
> >> >you explain, why it is not possible?
> On 14 Okt., 10:31, dormando <dorma...@rydia.net> wrote:
> > > Yeah, right. :-) Restarting all memd instances is not an option. Can
> > > you explain, why it is not possible?
> >
> > Because we've programmed the commands with the full intent to be atomic.
> > If it's not, there's a bug... there's an issue with incr/decr that's been
> > fixed upstream but we've never had a reported issue with add.
> >
> > I'm not sure what you want to hear. "They're supposed to be atomic, yes."
> > - that much is in the wiki too.
>
> I sure thought, that you designed memd to behave exactly the same with
> 1 or many threads and it's good to hear, that there is no pending bug
> concerning
> atomicity of add() on multiple threads. The reason why someone posts
> such a think on the mailinglist is to hear, what the opinion of a dev
> is who has all the insight. :-)
> So please understand my obstinately behaviour.
> We are planning to run some tests concerning this behaviour, maybe I
> can provide more detail in the future. But it will be hard to find
> proof for a bug in this scenario. For that we have to build
> a test scenario, with multiple instances trying to make an add() on
> the same key on the exact same time on a consistent hashing cluster.

Can you give more info about exactly what the app is doing? What version
you're on as well? I can squint at it again and see if there's some minute
case.

Need to know exactly what you're doing though. How long the key tends to
live, how many processes are hammering the same key, what you're setting
the timeout to, etc.

Your behavior's only obstinate because you keep asking if we're sure if
it's atomic. Yes it's supposed to be atomic, if you think you've found a
bug lets talk about bug hunting :P
> Can you give more info about exactly what the app is doing?

Something like this:

value = memcache.get("record" + x)

if (false == value && cache.add("lock" + x, "1", 60)) {

compute (expensive) record
insert record with Primary key x Into DB
memcache.set("record" + x, record);
memcache.delete("lock" + x);

} else {
// someone else is doing the expensive stuff
}

In a very few cases (<20 of 3 Million) we observed a "Duplicate entry"
Mysql-Error.
Is it ever possible that your compute takes longer than your timeout?

On Fri, Oct 15, 2010 at 5:45 AM, Tobias <[email protected]> wrote:

> > Can you give more info about exactly what the app is doing?
>
> Something like this:
>
> value = memcache.get("record" + x)
>
> if (false == value && cache.add("lock" + x, "1", 60)) {
>
> compute (expensive) record
> insert record with Primary key x Into DB
> memcache.set("record" + x, record);
> memcache.delete("lock" + x);
>
> } else {
> // someone else is doing the expensive stuff
> }
>
> In a very few cases (<20 of 3 Million) we observed a "Duplicate entry"
> Mysql-Error.
>
>
>
>


--
awl
> Is it ever possible that your compute takes longer than your timeout?

no, the return value of "memcache.delete("lock" + x) is true.
Les Mikesell
Re: Is memcache add() atomic on a multithreaded memcached?
October 17, 2010 06:10PM
On 10/17/10 6:07 AM, Tobias wrote:
>> Is it ever possible that your compute takes longer than your timeout?
>
> no, the return value of "memcache.delete("lock" + x) is true.

But wouldn't that also be true if another process found the expired lock and set
a new one?

--
Les Mikesell
lesmikesell@gmail.com
Sorry, only registered users may post in this forum.

Click here to login