Welcome! Log In Create A New Profile

Advanced

Discussion: Clock accuracy.

Posted by dormando 
dormando
Discussion: Clock accuracy.
June 30, 2010 09:00PM
Yo,

Something I've been wondering about for a while is a class of bug that we
have with clock skew.

Memcached handles the expiration timer via:

- "process_started" timestamp in seconds, which gets initialized at
startup
- "current_time" which, once per second, gets set to the delta between
the current time and "process_started"

If your clock swings around wildly there're a few situations where you
could potentially end up with items expiring immediately or never, such as
current_time ending up underflowing.

A couple easy ideas off the top of my head that would drop some accuracy
for avoiding timers (and any cross-platform timer idiocy):

- Ditch "proess_started" and kick a counter at 0. Every second the
current_time would be incremented by 1. A relative timeout of "60 seconds
from now" would be set to "current_time + 60" as it presently is. We'd
have to do something special for date formatted expirations. Potentially
by noting the exact time once on startup and using that to delta against a
provided date to provide the delta-in-seconds. The latter can still be
influenced by bad clock, but maybe not as noticable and the feature is
less used.

- Add some sanity checks in the clock update function, which will fall
back to incrementing by 1 if it detects a significant clock correction
forward, or if it's gone back in time. Still uses gettimeofday() unless
something goes wrong, keeps plodding forward less accurately when
something does go wrong.

- Use some anti-clock-skew magic that maybe libevent uses. Need to
research more options :P

Anyone care? The increase in the number of these types of reports is
getting obnoxious, and cloud computing's god-awful-ness can only make it
worse.

-Dormando
Brian Moon
Re: Discussion: Clock accuracy.
July 12, 2010 06:50AM
We have experienced this on a server where ntpd just decided to stop
working for days without us realizing it. I think starting at 0 is sane
idea.

Brian.
--------
http://brian.moonspot.net/

On 6/30/10 1:59 PM, dormando wrote:
> Yo,
>
> Something I've been wondering about for a while is a class of bug that we
> have with clock skew.
>
> Memcached handles the expiration timer via:
>
> - "process_started" timestamp in seconds, which gets initialized at
> startup
> - "current_time" which, once per second, gets set to the delta between
> the current time and "process_started"
>
> If your clock swings around wildly there're a few situations where you
> could potentially end up with items expiring immediately or never, such as
> current_time ending up underflowing.
>
> A couple easy ideas off the top of my head that would drop some accuracy
> for avoiding timers (and any cross-platform timer idiocy):
>
> - Ditch "proess_started" and kick a counter at 0. Every second the
> current_time would be incremented by 1. A relative timeout of "60 seconds
> from now" would be set to "current_time + 60" as it presently is. We'd
> have to do something special for date formatted expirations. Potentially
> by noting the exact time once on startup and using that to delta against a
> provided date to provide the delta-in-seconds. The latter can still be
> influenced by bad clock, but maybe not as noticable and the feature is
> less used.
>
> - Add some sanity checks in the clock update function, which will fall
> back to incrementing by 1 if it detects a significant clock correction
> forward, or if it's gone back in time. Still uses gettimeofday() unless
> something goes wrong, keeps plodding forward less accurately when
> something does go wrong.
>
> - Use some anti-clock-skew magic that maybe libevent uses. Need to
> research more options :P
>
> Anyone care? The increase in the number of these types of reports is
> getting obnoxious, and cloud computing's god-awful-ness can only make it
> worse.
>
> -Dormando
Sorry, only registered users may post in this forum.

Click here to login