Welcome! Log In Create A New Profile

Advanced

Using PCIe SSDs instead of RAM

Posted by Marten Lehmann 
Marten Lehmann
Using PCIe SSDs instead of RAM
July 09, 2010 09:20PM
Hello,

I know that memcached is designed to get its speed from the fast
access to RAM. But RAM is still very expensive - even with the amount
of RAM you get for the same money increasing every year.

When I thought of using PCIe SSDs instead of RAM I wasn't doing this
with regard to persistence of objects. I just noticed, that the Fusion-
io's ioDrives are working with near-RAM speed, having the PCIe bus as
the only bottleneck in speed (don't mix it up with SATA SSDs). An
ioDrive 160 GB with SLC memory is available for less than $6,000 and
is capable to perform more than 100,000 random IOPS (read and write),
whereas with ECC RAM you'd have to pay a multiple of that amount the
get the same ressources.

I don't know of any way to use a block device (like the ioDrive) as
RAM, you can only use RAM as a block device (which doesn't help in
this situation). So for the emerging market of PCIe SSDs (many high
performance databases are using this as replacement for RAID 10 arrays
and large RAM) it would be necessary to extend or branch memcached to
support SSD block devices.

Did someone start with that, is this possibly already on the roadmap,
or did the maintainers refuse to extend memcache with this option for
a reason?

Btw.: We are using memcached in conjunction with nginx as a web proxy
to our backend webservers to cache images and other static files,
which improves performance a lot. But 64 GB of RAM is much more
expensiv than 160 GB of an ioDrive PCIe SSD.

Kind regards
Marten
Mitch
Re: Using PCIe SSDs instead of RAM
July 14, 2010 07:50AM
Hi Marten!

I have developed a patch for memcached 1.4.x that splits memcached's
slab store into metadata and data bits, so that the key/values can
live on flash without a tremendous performance penalty. Ultimately, I
predict the best solution will be to use the storage engine branch and/
or Northscale's membase, but for the time being the patch works pretty
well. I'll send you a private email with more info.

thanks!
Mitch (from Fusion-io)

On Jul 9, 10:01 am, Marten Lehmann <[email protected]> wrote:
> Hello,
>
> I know that memcached is designed to get its speed from the fast
> access to RAM. But RAM is still very expensive - even with the amount
> of RAM you get for the same money increasing every year.
>
> When I thought of using PCIe SSDs instead of RAM I wasn't doing this
> with regard to persistence of objects. I just noticed, that the Fusion-
> io's ioDrives are working with near-RAM speed, having the PCIe bus as
> the only bottleneck in speed (don't mix it up with SATA SSDs). An
> ioDrive 160 GB with SLC memory is available for less than $6,000 and
> is capable to perform more than 100,000 random IOPS (read and write),
> whereas with ECC RAM you'd have to pay a multiple of that amount the
> get the same ressources.
>
> I don't know of any way to use a block device (like the ioDrive) as
> RAM, you can only use RAM as a block device (which doesn't help in
> this situation). So for the emerging market of PCIe SSDs (many high
> performance databases are using this as replacement for RAID 10 arrays
> and large RAM) it would be necessary to extend or branch memcached to
> support SSD block devices.
>
> Did someone start with that, is this possibly already on the roadmap,
> or did the maintainers refuse to extend memcache with this option for
> a reason?
>
> Btw.: We are using memcached in conjunction with nginx as a web proxy
> to our backend webservers to cache images and other static files,
> which improves performance a lot. But 64 GB of RAM is much more
> expensiv than 160 GB of an ioDrive PCIe SSD.
>
> Kind regards
> Marten
Artur Ejsmont
Re: Using PCIe SSDs instead of RAM
July 14, 2010 02:50PM
That actually sounds like an awesome idea!

memcached is great but having persistence would give a whole new quality!

Storing sessions or whatever state you need would be much more reliable.
Space available would grow 10 fold as well :- )

Great idea, would love to see it as an option in production stable memcached
!

art

On 14 July 2010 06:42, Mitch <[email protected]> wrote:

> Hi Marten!
>
> I have developed a patch for memcached 1.4.x that splits memcached's
> slab store into metadata and data bits, so that the key/values can
> live on flash without a tremendous performance penalty. Ultimately, I
> predict the best solution will be to use the storage engine branch and/
> or Northscale's membase, but for the time being the patch works pretty
> well. I'll send you a private email with more info.
>
> thanks!
> Mitch (from Fusion-io)
>
> On Jul 9, 10:01 am, Marten Lehmann <[email protected]> wrote:
> > Hello,
> >
> > I know that memcached is designed to get its speed from the fast
> > access to RAM. But RAM is still very expensive - even with the amount
> > of RAM you get for the same money increasing every year.
> >
> > When I thought of using PCIe SSDs instead of RAM I wasn't doing this
> > with regard to persistence of objects. I just noticed, that the Fusion-
> > io's ioDrives are working with near-RAM speed, having the PCIe bus as
> > the only bottleneck in speed (don't mix it up with SATA SSDs). An
> > ioDrive 160 GB with SLC memory is available for less than $6,000 and
> > is capable to perform more than 100,000 random IOPS (read and write),
> > whereas with ECC RAM you'd have to pay a multiple of that amount the
> > get the same ressources.
> >
> > I don't know of any way to use a block device (like the ioDrive) as
> > RAM, you can only use RAM as a block device (which doesn't help in
> > this situation). So for the emerging market of PCIe SSDs (many high
> > performance databases are using this as replacement for RAID 10 arrays
> > and large RAM) it would be necessary to extend or branch memcached to
> > support SSD block devices.
> >
> > Did someone start with that, is this possibly already on the roadmap,
> > or did the maintainers refuse to extend memcache with this option for
> > a reason?
> >
> > Btw.: We are using memcached in conjunction with nginx as a web proxy
> > to our backend webservers to cache images and other static files,
> > which improves performance a lot. But 64 GB of RAM is much more
> > expensiv than 160 GB of an ioDrive PCIe SSD.
> >
> > Kind regards
> > Marten
David Raccah
Re: Using PCIe SSDs instead of RAM
July 14, 2010 03:00PM
Can you also send me your patch? We have been waiting for the storage
engine, but we are not close to maxing out our systems yet.

Thanks'
David

On 7/13/10, Mitch <[email protected]> wrote:
> Hi Marten!
>
> I have developed a patch for memcached 1.4.x that splits memcached's
> slab store into metadata and data bits, so that the key/values can
> live on flash without a tremendous performance penalty. Ultimately, I
> predict the best solution will be to use the storage engine branch and/
> or Northscale's membase, but for the time being the patch works pretty
> well. I'll send you a private email with more info.
>
> thanks!
> Mitch (from Fusion-io)
>
> On Jul 9, 10:01 am, Marten Lehmann <[email protected]> wrote:
>> Hello,
>>
>> I know that memcached is designed to get its speed from the fast
>> access to RAM. But RAM is still very expensive - even with the amount
>> of RAM you get for the same money increasing every year.
>>
>> When I thought of using PCIe SSDs instead of RAM I wasn't doing this
>> with regard to persistence of objects. I just noticed, that the Fusion-
>> io's ioDrives are working with near-RAM speed, having the PCIe bus as
>> the only bottleneck in speed (don't mix it up with SATA SSDs). An
>> ioDrive 160 GB with SLC memory is available for less than $6,000 and
>> is capable to perform more than 100,000 random IOPS (read and write),
>> whereas with ECC RAM you'd have to pay a multiple of that amount the
>> get the same ressources.
>>
>> I don't know of any way to use a block device (like the ioDrive) as
>> RAM, you can only use RAM as a block device (which doesn't help in
>> this situation). So for the emerging market of PCIe SSDs (many high
>> performance databases are using this as replacement for RAID 10 arrays
>> and large RAM) it would be necessary to extend or branch memcached to
>> support SSD block devices.
>>
>> Did someone start with that, is this possibly already on the roadmap,
>> or did the maintainers refuse to extend memcache with this option for
>> a reason?
>>
>> Btw.: We are using memcached in conjunction with nginx as a web proxy
>> to our backend webservers to cache images and other static files,
>> which improves performance a lot. But 64 GB of RAM is much more
>> expensiv than 160 GB of an ioDrive PCIe SSD.
>>
>> Kind regards
>> Marten
Guille -bisho-
Re: Using PCIe SSDs instead of RAM
July 21, 2010 02:00PM
Many memcache users are more interested in latency than in huge
amounts of memory to cache. The drive you mention is 26µs [1] compared
with ~22.5 ns [2], 3 orders of magnitude more.

If your application is accesing a lot of small memcache data to
process a page, the increased latency will be noticed. If it's just
for caching just full content (full pages) might be interesting, but
then you might be more interested in something like varnish that uses
regular disks and memory as cache.

Whats is your use case?

[1] http://www.fusionio.com/products/iodrive/?tab=specs
[2] http://en.wikipedia.org/wiki/Dynamic_random_access_memory#Memory_timing

--
Guille -ℬḭṩḩø- <[email protected]>
:wq
Jakub Łopuszański
Re: Using PCIe SSDs instead of RAM
July 22, 2010 06:40PM
I see that my patch for garbage collection is still being ignored, and your
post gives me some idea about why it is so.
I think that RAM is a real problem, because currently (without GC) you have
no clue about how much RAM you really need. So you can end up blindly buying
more and more machines, which effectively means that multiget works worse
and worse (client issues one big multiget but it gets split into many
packets to many servers).
Currently we try to get number of servers in the cluster smaller based on
the reall consumption to get more from multiget feature.

So I believe that there is an important connection between RAM and speed,
and this connection is number of servers in the cluster.

On Wed, Jul 21, 2010 at 1:51 PM, Guille -bisho- <[email protected]> wrote:

> Many memcache users are more interested in latency than in huge
> amounts of memory to cache. The drive you mention is 26µs [1] compared
> with ~22.5 ns [2], 3 orders of magnitude more.
>
> If your application is accesing a lot of small memcache data to
> process a page, the increased latency will be noticed. If it's just
> for caching just full content (full pages) might be interesting, but
> then you might be more interested in something like varnish that uses
> regular disks and memory as cache.
>
> Whats is your use case?
>
> [1] http://www.fusionio.com/products/iodrive/?tab=specs
> [2]
> http://en.wikipedia.org/wiki/Dynamic_random_access_memory#Memory_timing
>
> --
> Guille -ℬḭṩḩø- <[email protected]>
> :wq
>
Brian Moon
Re: Using PCIe SSDs instead of RAM
July 22, 2010 07:00PM
On 7/22/10 5:46 AM, Jakub Łopuszański wrote:
> I see that my patch for garbage collection is still being ignored, and
> your post gives me some idea about why it is so.
> I think that RAM is a real problem, because currently (without GC) you
> have no clue about how much RAM you really need. So you can end up
> blindly buying more and more machines, which effectively means that
> multiget works worse and worse (client issues one big multiget but it
> gets split into many packets to many servers).
> Currently we try to get number of servers in the cluster smaller based
> on the reall consumption to get more from multiget feature.

I would never, never, never want my memcached daemon ram usage to
fluctuate wildly. Eviction rate is a much better determination of how
well your cache is being used.

--

Brian.
--------
http://brian.moonspot.net/
Jakub Łopuszański
Re: Using PCIe SSDs instead of RAM
July 22, 2010 09:10PM
Well, I beg to differ.
We used to have evictions > 0, actually around 200 (per whatever munin
counts them), so we used to think, that we have too small number of
machines, and kept adding them.
After using the patch, the memory usage dropped by 80%, and we have no
evictions since a long time, which means, that evictions where misleading,
and happened just because LRU sometimes kills fresh items, even though there
are lots of outdated keys.

Moreover it's not like RAM usage "fluctuates wildly". It's kind of constant,
or at least periodic, so you can very accurately say if something bad
happened, as it would be instantly visible as a deviation from yesterday's
charts. Before applying the patch, you could as well not look at the chart
at all, as it was more than sure that it always shows 100% usage, which in
my opinion gives no clue about what is actually going on.

Even if you are afraid of "wildly fluctuating" charts, you will not solve
the problem by hiding it, and this is what actually happens if you don't
have GC -- the traffic, the number of outdated keys, they all fluctuate, but
you just don't see it, if the chart always shows 100% usage...

2010/7/22 Brian Moon <[email protected]>

> On 7/22/10 5:46 AM, Jakub Łopuszański wrote:
>
>> I see that my patch for garbage collection is still being ignored, and
>> your post gives me some idea about why it is so.
>> I think that RAM is a real problem, because currently (without GC) you
>> have no clue about how much RAM you really need. So you can end up
>> blindly buying more and more machines, which effectively means that
>> multiget works worse and worse (client issues one big multiget but it
>> gets split into many packets to many servers).
>> Currently we try to get number of servers in the cluster smaller based
>> on the reall consumption to get more from multiget feature.
>>
>
> I would never, never, never want my memcached daemon ram usage to fluctuate
> wildly. Eviction rate is a much better determination of how well your cache
> is being used.
>
> --
>
> Brian.
> --------
> http://brian.moonspot.net/
>
Brian Moon
Re: Using PCIe SSDs instead of RAM
July 22, 2010 09:10PM
On 7/22/10 2:02 PM, Jakub Łopuszański wrote:
> Well, I beg to differ.
> We used to have evictions > 0, actually around 200 (per whatever munin
> counts them), so we used to think, that we have too small number of
> machines, and kept adding them.
> After using the patch, the memory usage dropped by 80%, and we have no
> evictions since a long time, which means, that evictions where
> misleading, and happened just because LRU sometimes kills fresh items,
> even though there are lots of outdated keys.

Let me make sure I understand your claim here. You are claiming that the
LRU is evicting things even though there are expired items in the slabs?
And that expired items are left in the slabs and non-expired items are
removed from the slab by the LRU? That is your claim? I just want to be
clear.

> Moreover it's not like RAM usage "fluctuates wildly". It's kind of
> constant, or at least periodic, so you can very accurately say if
> something bad happened, as it would be instantly visible as a deviation
> from yesterday's charts. Before applying the patch, you could as well
> not look at the chart at all, as it was more than sure that it always
> shows 100% usage, which in my opinion gives no clue about what is
> actually going on.
>
> Even if you are afraid of "wildly fluctuating" charts, you will not
> solve the problem by hiding it, and this is what actually happens if you
> don't have GC -- the traffic, the number of outdated keys, they all
> fluctuate, but you just don't see it, if the chart always shows 100%
> usage...

It has nothing to do with fear. It has to do with managing resources. A
sudden peak in evictions is much better than a sudden lack of memory on
all my memcached servers. Evictions > OOM.

--

Brian.
--------
http://brian.moonspot.net/
dormando
Re: Using PCIe SSDs instead of RAM
July 23, 2010 12:20AM
http://code.google.com/p/memcached/wiki/NewServerMaint#Looks_Can_be_Deceiving

Think I'll write a separate page about managing memory, based off of the
slides from my mysqlconf presentation about monitoring memcached...

We're not ignoring you, the patch is against what the LRU is designed for.
Several people have argued to put garbage collection back into memcached,
but it just doesn't mix.

In the interest of being constructive, you should look back through the
mailing list for details on the storage engine branch, and if you really
want it to work, it'd be a good exercise to implement this as a custom
storage engine.

In the interest of being thorough; you proved your own patch unnecessary
by noting that the hitrate did not change. It just confirmed you weren't
having a problem.

The short notes of my slides are just:

- Note evictions over time
- Note hitrate over time
- Investigate changes to either via a traffic snapshot from maatkit,
either on your memcached server or from an app server. Or setup one app
server to log its memcached traffic. whatever you need to do.
- Note your DB load as well, and correlate *all* of these numbers.

You'll get way more useful information out of the *flow* through memcached
than from *what's inside it*. What's inside it doesn't matter, at all!

Keep your hitrate stable, investigate what your app is doing when it
changes. If there's nothing for you to fix and the hitrate is dropping, db
load is increasing, add more memcached servers. It's really really simple.
Honestly! Looking at just one stat and making that decision is pretty
weird.

In your case, you were seeing evictions despite 50% of your memory being
loaded with expired items. Neither of these things are a problem or even
matter, because:

- expired items are freed when they're fetched
- evicted items are picked off of the tail of the LRU

which means that *neither* the expired items or the evicted items are
being accessed at all. You have unexpired items which are being accessed
less frequently than stuff that's being expired!

It *could* indicate a problem, but simply garbage collecting will actually
*hide* it from you! You'll find it by analyzing your miss's and set's. You
might then see that your app is uselessly setting hundreds of keys every
time a user loads their profile, or frontpage, or whatever. Those keys
then expire without ever being used again.

That should lead you into a *real* benefit of not wasting time setting
extraneous keys, or fetching keys that never exist, or finding places to
combine data or issue multigets more correctly.

With respect to your multiget note, I went over this in quite a bit of
detail: http://dormando.livejournal.com/521163.html

If you're multiget'ing related data, there's zero reason for it to hit
more than one memcached instance. Except maybe you're fetching mass
numbers of huge keys and it makes more sense for the TCP sessions to be
split up in parallel. I dunno.

In one final note, I'd really really appreciate it if you could stop
hijacking threads to promote your patch. It's pretty rude, as your garbage
collector issue has been discussed on the list several times.

On Thu, 22 Jul 2010, Jakub Łopuszański wrote:

> Well, I beg to differ.
> We used to have evictions > 0, actually around 200 (per whatever munin counts them), so we used to think, that we have too small number of machines, and kept adding them.
> After using the patch, the memory usage dropped by 80%, and we have no evictions since a long time, which means, that evictions where misleading, and happened just because LRU sometimes kills fresh items,
> even though there are lots of outdated keys.
>
> Moreover it's not like RAM usage "fluctuates wildly". It's kind of constant, or at least periodic, so you can very accurately say if something bad happened, as it would be instantly visible as a deviation
> from yesterday's charts. Before applying the patch, you could as well not look at the chart at all, as it was more than sure that it always shows 100% usage, which in my opinion gives no clue about what is
> actually going on.
>
> Even if you are afraid of "wildly fluctuating" charts, you will not solve the problem by hiding it, and this is what actually happens if you don't have GC -- the traffic, the number of outdated keys, they
> all fluctuate, but you just don't see it, if the chart always shows 100% usage...
>
> 2010/7/22 Brian Moon <[email protected]>
> On 7/22/10 5:46 AM, Jakub Łopuszański wrote:
> I see that my patch for garbage collection is still being ignored, and
> your post gives me some idea about why it is so.
> I think that RAM is a real problem, because currently (without GC) you
> have no clue about how much RAM you really need. So you can end up
> blindly buying more and more machines, which effectively means that
> multiget works worse and worse (client issues one big multiget but it
> gets split into many packets to many servers).
> Currently we try to get number of servers in the cluster smaller based
> on the reall consumption to get more from multiget feature.
>
>
> I would never, never, never want my memcached daemon ram usage to fluctuate wildly. Eviction rate is a much better determination of how well your cache is being used.
>
> --
>
> Brian.
> --------
> http://brian.moonspot.net/
>
>
>
>
Jakub Łopuszański
Re: Using PCIe SSDs instead of RAM
July 23, 2010 08:50AM
While I agree with most of your thesis, I can't see how GC is against the
LRU.

I agree, that often accessed keys with short TTL seem strange, and so do
rarely accessed keys with long TTL. But there are lots of perfect reasons to
have such situation, and we do.
GC does not work against the LRU (at least I can't see it), it cooperates.
Apparently LRU is never used, because you have smaller chances to run out of
memory, but I'd like to answer doubts of Brian Moon: in case whole memory is
occupied you will not get "sudden lack of memory", but just the usuall
thing: LRU will start to evict oldest items.
I agree that monitoring hitrates and evictions makes sens, but you can
forcast problems much sooner if you monitor number of unexpired items, as
well.
The point is: GC does not forbid you from using your regular monitoring
tools, skills and procedures. It just gives you another tool: live
monitoring of unexpired items.
I see nothing bad about it:)

Scenario 1. You are releasing new feature, and you want to scale the number
of servers accordingly to the load. You can monitor memory usage as the
users join, extrapolate, and order new machines much sooner, than by
monitoring evictions, as evictions indicate that you already have a problem..
Scenario 2. You need to steal machines from one cluster to help build
another one, and you have to decide if you can do so safely without risking
that the old cluster will "run of memory". Again monitoring evictions can
not reliably tell you how many machines can you remove from the cluster,
while monitoring memory gives you perfectly accurate info.


On Fri, Jul 23, 2010 at 12:12 AM, dormando <[email protected]> wrote:

>
> http://code.google.com/p/memcached/wiki/NewServerMaint#Looks_Can_be_Deceiving
>
> Think I'll write a separate page about managing memory, based off of the
> slides from my mysqlconf presentation about monitoring memcached...
>
> We're not ignoring you, the patch is against what the LRU is designed for..
> Several people have argued to put garbage collection back into memcached,
> but it just doesn't mix.
>
> In the interest of being constructive, you should look back through the
> mailing list for details on the storage engine branch, and if you really
> want it to work, it'd be a good exercise to implement this as a custom
> storage engine.
>
> In the interest of being thorough; you proved your own patch unnecessary
> by noting that the hitrate did not change. It just confirmed you weren't
> having a problem.
>
> The short notes of my slides are just:
>
> - Note evictions over time
> - Note hitrate over time
> - Investigate changes to either via a traffic snapshot from maatkit,
> either on your memcached server or from an app server. Or setup one app
> server to log its memcached traffic. whatever you need to do.
> - Note your DB load as well, and correlate *all* of these numbers.
>
> You'll get way more useful information out of the *flow* through memcached
> than from *what's inside it*. What's inside it doesn't matter, at all!
>
> Keep your hitrate stable, investigate what your app is doing when it
> changes. If there's nothing for you to fix and the hitrate is dropping, db
> load is increasing, add more memcached servers. It's really really simple..
> Honestly! Looking at just one stat and making that decision is pretty
> weird.
>
> In your case, you were seeing evictions despite 50% of your memory being
> loaded with expired items. Neither of these things are a problem or even
> matter, because:
>
> - expired items are freed when they're fetched
> - evicted items are picked off of the tail of the LRU
>
> which means that *neither* the expired items or the evicted items are
> being accessed at all. You have unexpired items which are being accessed
> less frequently than stuff that's being expired!
>
> It *could* indicate a problem, but simply garbage collecting will actually
> *hide* it from you! You'll find it by analyzing your miss's and set's. You
> might then see that your app is uselessly setting hundreds of keys every
> time a user loads their profile, or frontpage, or whatever. Those keys
> then expire without ever being used again.
>
> That should lead you into a *real* benefit of not wasting time setting
> extraneous keys, or fetching keys that never exist, or finding places to
> combine data or issue multigets more correctly.
>
> With respect to your multiget note, I went over this in quite a bit of
> detail: http://dormando.livejournal.com/521163.html
>
> If you're multiget'ing related data, there's zero reason for it to hit
> more than one memcached instance. Except maybe you're fetching mass
> numbers of huge keys and it makes more sense for the TCP sessions to be
> split up in parallel. I dunno.
>
> In one final note, I'd really really appreciate it if you could stop
> hijacking threads to promote your patch. It's pretty rude, as your garbage
> collector issue has been discussed on the list several times.
>
> On Thu, 22 Jul 2010, Jakub Łopuszański wrote:
>
> > Well, I beg to differ.
> > We used to have evictions > 0, actually around 200 (per whatever munin
> counts them), so we used to think, that we have too small number of
> machines, and kept adding them.
> > After using the patch, the memory usage dropped by 80%, and we have no
> evictions since a long time, which means, that evictions where misleading,
> and happened just because LRU sometimes kills fresh items,
> > even though there are lots of outdated keys.
> >
> > Moreover it's not like RAM usage "fluctuates wildly". It's kind of
> constant, or at least periodic, so you can very accurately say if something
> bad happened, as it would be instantly visible as a deviation
> > from yesterday's charts. Before applying the patch, you could as well not
> look at the chart at all, as it was more than sure that it always shows 100%
> usage, which in my opinion gives no clue about what is
> > actually going on.
> >
> > Even if you are afraid of "wildly fluctuating" charts, you will not solve
> the problem by hiding it, and this is what actually happens if you don't
> have GC -- the traffic, the number of outdated keys, they
> > all fluctuate, but you just don't see it, if the chart always shows 100%
> usage...
> >
> > 2010/7/22 Brian Moon <[email protected]>
> > On 7/22/10 5:46 AM, Jakub Łopuszański wrote:
> > I see that my patch for garbage collection is still being
> ignored, and
> > your post gives me some idea about why it is so.
> > I think that RAM is a real problem, because currently
> (without GC) you
> > have no clue about how much RAM you really need. So you can
> end up
> > blindly buying more and more machines, which effectively
> means that
> > multiget works worse and worse (client issues one big
> multiget but it
> > gets split into many packets to many servers).
> > Currently we try to get number of servers in the cluster
> smaller based
> > on the reall consumption to get more from multiget feature.
> >
> >
> > I would never, never, never want my memcached daemon ram usage to
> fluctuate wildly. Eviction rate is a much better determination of how well
> your cache is being used.
> >
> > --
> >
> > Brian.
> > --------
> > http://brian.moonspot.net/
> >
> >
> >
> >
>
dormando
Re: Using PCIe SSDs instead of RAM
July 23, 2010 08:50AM
I tried.

Try the engine branch?

On Fri, 23 Jul 2010, Jakub Łopuszański wrote:

> While I agree with most of your thesis, I can't see how GC is against the LRU.
>
> I agree, that often accessed keys with short TTL seem strange, and so do rarely accessed keys with long TTL. But there are lots of perfect reasons to have such situation, and we do.
> GC does not work against the LRU (at least I can't see it), it cooperates. Apparently LRU is never used, because you have smaller chances to run out of memory, but I'd like to answer doubts of Brian Moon:
> in case whole memory is occupied you will not get "sudden lack of memory", but just the usuall thing: LRU will start to evict oldest items.
> I agree that monitoring hitrates and evictions makes sens, but you can forcast problems much sooner if you monitor number of unexpired items, as well.
> The point is: GC does not forbid you from using your regular monitoring tools, skills and procedures. It just gives you another tool: live monitoring of unexpired items.
> I see nothing bad about it:)
>
> Scenario 1. You are releasing new feature, and you want to scale the number of servers accordingly to the load. You can monitor memory usage as the users join, extrapolate, and order new machines much
> sooner, than by monitoring evictions, as evictions indicate that you already have a problem.
> Scenario 2. You need to steal machines from one cluster to help build another one, and you have to decide if you can do so safely without risking that the old cluster will "run of memory". Again monitoring
> evictions can not reliably tell you how many machines can you remove from the cluster, while monitoring memory gives you perfectly accurate info.
>
>
> On Fri, Jul 23, 2010 at 12:12 AM, dormando <[email protected]> wrote:
> http://code.google.com/p/memcached/wiki/NewServerMaint#Looks_Can_be_Deceiving
>
> Think I'll write a separate page about managing memory, based off of the
> slides from my mysqlconf presentation about monitoring memcached...
>
> We're not ignoring you, the patch is against what the LRU is designed for.
> Several people have argued to put garbage collection back into memcached,
> but it just doesn't mix.
>
> In the interest of being constructive, you should look back through the
> mailing list for details on the storage engine branch, and if you really
> want it to work, it'd be a good exercise to implement this as a custom
> storage engine.
>
> In the interest of being thorough; you proved your own patch unnecessary
> by noting that the hitrate did not change. It just confirmed you weren't
> having a problem.
>
> The short notes of my slides are just:
>
> - Note evictions over time
> - Note hitrate over time
> - Investigate changes to either via a traffic snapshot from maatkit,
> either on your memcached server or from an app server. Or setup one app
> server to log its memcached traffic. whatever you need to do.
> - Note your DB load as well, and correlate *all* of these numbers.
>
> You'll get way more useful information out of the *flow* through memcached
> than from *what's inside it*. What's inside it doesn't matter, at all!
>
> Keep your hitrate stable, investigate what your app is doing when it
> changes. If there's nothing for you to fix and the hitrate is dropping, db
> load is increasing, add more memcached servers. It's really really simple.
> Honestly! Looking at just one stat and making that decision is pretty
> weird.
>
> In your case, you were seeing evictions despite 50% of your memory being
> loaded with expired items. Neither of these things are a problem or even
> matter, because:
>
> - expired items are freed when they're fetched
> - evicted items are picked off of the tail of the LRU
>
> which means that *neither* the expired items or the evicted items are
> being accessed at all. You have unexpired items which are being accessed
> less frequently than stuff that's being expired!
>
> It *could* indicate a problem, but simply garbage collecting will actually
> *hide* it from you! You'll find it by analyzing your miss's and set's. You
> might then see that your app is uselessly setting hundreds of keys every
> time a user loads their profile, or frontpage, or whatever. Those keys
> then expire without ever being used again.
>
> That should lead you into a *real* benefit of not wasting time setting
> extraneous keys, or fetching keys that never exist, or finding places to
> combine data or issue multigets more correctly.
>
> With respect to your multiget note, I went over this in quite a bit of
> detail: http://dormando.livejournal.com/521163.html
>
> If you're multiget'ing related data, there's zero reason for it to hit
> more than one memcached instance. Except maybe you're fetching mass
> numbers of huge keys and it makes more sense for the TCP sessions to be
> split up in parallel. I dunno.
>
> In one final note, I'd really really appreciate it if you could stop
> hijacking threads to promote your patch. It's pretty rude, as your garbage
> collector issue has been discussed on the list several times.
>
> On Thu, 22 Jul 2010, Jakub Łopuszański wrote:
>
> > Well, I beg to differ.
> > We used to have evictions > 0, actually around 200 (per whatever munin counts them), so we used to think, that we have too small number of machines, and kept adding them.
> > After using the patch, the memory usage dropped by 80%, and we have no evictions since a long time, which means, that evictions where misleading, and happened just because LRU sometimes kills fresh
> items,
> > even though there are lots of outdated keys.
> >
> > Moreover it's not like RAM usage "fluctuates wildly". It's kind of constant, or at least periodic, so you can very accurately say if something bad happened, as it would be instantly visible as a
> deviation
> > from yesterday's charts. Before applying the patch, you could as well not look at the chart at all, as it was more than sure that it always shows 100% usage, which in my opinion gives no clue about
> what is
> > actually going on.
> >
> > Even if you are afraid of "wildly fluctuating" charts, you will not solve the problem by hiding it, and this is what actually happens if you don't have GC -- the traffic, the number of outdated
> keys, they
> > all fluctuate, but you just don't see it, if the chart always shows 100% usage...
> >
> > 2010/7/22 Brian Moon <[email protected]>
> >       On 7/22/10 5:46 AM, Jakub Łopuszański wrote:
> >             I see that my patch for garbage collection is still being ignored, and
> >             your post gives me some idea about why it is so.
> >             I think that RAM is a real problem, because currently (without GC) you
> >             have no clue about how much RAM you really need. So you can end up
> >             blindly buying more and more machines, which effectively means that
> >             multiget works worse and worse (client issues one big multiget but it
> >             gets split into many packets to many servers).
> >             Currently we try to get number of servers in the cluster smaller based
> >             on the reall consumption to get more from multiget feature.
> >
> >
> > I would never, never, never want my memcached daemon ram usage to fluctuate wildly. Eviction rate is a much better determination of how well your cache is being used.
> >
> > --
> >
> > Brian.
> > --------
> > http://brian.moonspot.net/
> >
> >
> >
> >
>
>
>
>
Jakub Łopuszański
Re: Using PCIe SSDs instead of RAM
July 23, 2010 09:20AM
On Fri, Jul 23, 2010 at 8:47 AM, dormando <[email protected]> wrote:

> I tried.
>
> Try the engine branch?
>
> I guess, I'll have to at some point.

Just wanted to say, that LRU was designed as an algorithm for a uniform cost
model, where all elements are almost equally important (have the same cost
of miss) and the only thing that distinguishes them is the pattern of
accesses. This is clearly not a good model for memcache, where: some
elements are totally unimportant as they have already expired, some elements
are larger than the others, some are always processed in batches
(multigets), and so on. In my opinion GC moves the reality closer to the
model, by removing unimportant elements, so if you want LRU to work
correctly you should at least perform GC. You could also try to modify LRU
to model that one large item actually occupies space that could be better
utilies by several small elements (this is also a simple change). If you
fill comfortable without GC, I am OK with that, just do not suggest, that GC
is against LRU.
Ben Manes
Re: Using PCIe SSDs instead of RAM
July 23, 2010 08:40PM
There are alternatives to LRU, which is generally chosen for being extremely
simple to implement, fast, and has a reasonable hit rate. The
Greedy-Dual-Size-Frequency policy may be more appropriate for memcached as it
accounts a value's weight. I doubt that there's a lot of value of changing the
current design, but there are alternatives to approaches that would need to be
considered if GC was a serious consideration.




________________________________
From: Jakub Łopuszański <[email protected]>
To: memcached@googlegroups.com
Sent: Fri, July 23, 2010 12:16:16 AM
Subject: Re: Using PCIe SSDs instead of RAM




On Fri, Jul 23, 2010 at 8:47 AM, dormando <[email protected]> wrote:

I tried.
>
>Try the engine branch?
>
>
>
I guess, I'll have to at some point.

Just wanted to say, that LRU was designed as an algorithm for a uniform cost
model, where all elements are almost equally important (have the same cost of
miss) and the only thing that distinguishes them is the pattern of accesses.
This is clearly not a good model for memcache, where: some elements are totally
unimportant as they have already expired, some elements are larger than the
others, some are always processed in batches (multigets), and so on. In my
opinion GC moves the reality closer to the model, by removing unimportant
elements, so if you want LRU to work correctly you should at least perform GC.
You could also try to modify LRU to model that one large item actually occupies
space that could be better utilies by several small elements (this is also a
simple change). If you fill comfortable without GC, I am OK with that, just do
not suggest, that GC is against LRU.
Dustin
Re: Using PCIe SSDs instead of RAM
July 24, 2010 05:20AM
On Jul 23, 11:31 am, Ben Manes <[email protected]> wrote:
> There are alternatives to LRU, which is generally chosen for being extremely
> simple to implement, fast, and has a reasonable hit rate. The
> Greedy-Dual-Size-Frequency policy may be more appropriate for memcached as it
> accounts a value's weight. I doubt that there's a lot of value of changing the
> current design, but there are alternatives to approaches that would need to be
> considered if GC was a serious consideration.

An engine that does this would be welcome. :)

A big reason storage engines were introduced a while back was so
that people with different theories of operation could could implement
new storage or eviction models and have them maintain relevance as the
memcached core itself progresses forward.

There's nobody to say you can't have your own engine for people to
try out (and perhaps even have excellent luck in different
environments), and if/when a universally better model arises, we can
change defaults.
dormando
Re: Using PCIe SSDs instead of RAM
July 25, 2010 12:50PM
> On Fri, Jul 23, 2010 at 8:47 AM, dormando <[email protected]> wrote:
> I tried.
>
> Try the engine branch?
>
> I guess, I'll have to at some point.
>
> Just wanted to say, that LRU was designed as an algorithm for a uniform cost model, where all elements are almost equally important (have the same cost of miss) and the only thing that distinguishes them is
> the pattern of accesses. This is clearly not a good model for memcache, where: some elements are totally unimportant as they have already expired, some elements are larger than the others, some are always
> processed in batches (multigets), and so on. In my opinion GC moves the reality closer to the model, by removing unimportant elements, so if you want LRU to work correctly you should at least perform GC.
> You could also try to modify LRU to model that one large item actually occupies space that could be better utilies by several small elements (this is also a simple change). If you fill comfortable without
> GC, I am OK with that, just do not suggest, that GC is against LRU.

Alright, I'm sorry. I've been unfair to you (and a few others recently).
I've been unnecessarily grumpy. I tried to explain myself as fairly as
possible, and Dustin added the words that I apparently forgot already, in
that these things are better pressed through via SE's.

I get annoyed by these threads because:

- I really don't care for arguments on this level. When I said "GC goes
against the LRU" I mean that the LRU we have doesn't require GC. The whole
point of adding the LRU was so we could skip that part. I'm describing
*intent*, I'm just too tired to keep arguing these things.

- The thread hijacking is seriously annoying. If you want to ping us about
an ignored patch, start a new thread or necro your own old thread. :(

- Your original e-mail opened with "We run this in single threaded mode
and the performance is good enough for us so please merge it". I'm pretty
dumbfounded that people can take a project which is supposed to be the
performant underpinnings of the entire bloody internet and not do any sort
of performance testing.

I try to test things and I do have some hardware on hand but I'm still
trying to find the motivation in myself to do a thorough performance
run through of the engine branch. There's a lot of stuff going on in
there. This is time consuming and often frustrating work.

You did make a good attempt at building an efficient implementation, and
it's a very clever way to go about the business, but best case:

- You're adding logic to the most central global lock
- You're adding 16 bytes per object
- Plus some misc memory overhead (minor).

If they're not causing the locks to be problems, the memory efficiency
drop is an issue for many more people. If we make changes to the memory
requirements of the default engine, I really only want to entertain ideas
that make it *drop* requirements (we have some, need to start testing
them as the engine stuff gets out there).

The big picture is many users have small items, and if we push this change
many people will suffer.

Yes it's true that once those metrics "expose" an issue you technically
already have an issue, but it's not an instant dropoff. Easily calculable
with graphs and things like the "evicted_time" stats. Items dropping off
the end that haven't been touched in 365,000+ seconds aren't likely to
cause you a problem tomorrow or even next week, but watch for that number
to fall. This is also why the evicted and evicted_nonzero stats were
split. Eviction of an item with a 0 expiration is nearly meaningless.

However, I can't seem to get this through without being rude to people,
and I apologize for that. I should've responded to your original message
with these *technical* problems instead of just harping on the idea that
it looks like you weren't using all of the available statistics properly.

I'm trying to chillax and get back to being a fun (albeit grumpy)
productive hacker dude. Sorry, all.

-Dormando
Jakub Łopuszański
Re: Using PCIe SSDs instead of RAM
July 25, 2010 08:20PM
Thanks for an explanation.

I see that we have entirely different points of view, probably caused by
totally different identified sets of bottlenecks, different usage, different
configurations etc (I assume that you have greater experience, since my is
restricted to one company, with just 55 memcache machines). For example you
often say about the locks and CPU usage, while we observed that (not
surprisingly to us) those O(1) operations, are relatively insignificant
compared to socket operations which take ages.

I agree that 16 extra bytes is a serious problem though. If I had time I
would definitely try to implement a version that uses just 8 bytes or less
(for example by reimplementing TTL buckets as an array of pointers to items
hashed by item address). This was just a proof of concept, that you can have
GC in O(1), which some ppl claimed to be difficult, which turned out to work
very well for us at nk.pl.

Sorry for tread hijacking, and all.

On Sun, Jul 25, 2010 at 12:46 PM, dormando <[email protected]> wrote:

> > On Fri, Jul 23, 2010 at 8:47 AM, dormando <[email protected]> wrote:
> > I tried.
> >
> > Try the engine branch?
> >
> > I guess, I'll have to at some point.
> >
> > Just wanted to say, that LRU was designed as an algorithm for a uniform
> cost model, where all elements are almost equally important (have the same
> cost of miss) and the only thing that distinguishes them is
> > the pattern of accesses. This is clearly not a good model for memcache,
> where: some elements are totally unimportant as they have already expired,
> some elements are larger than the others, some are always
> > processed in batches (multigets), and so on. In my opinion GC moves the
> reality closer to the model, by removing unimportant elements, so if you
> want LRU to work correctly you should at least perform GC.
> > You could also try to modify LRU to model that one large item actually
> occupies space that could be better utilies by several small elements (this
> is also a simple change). If you fill comfortable without
> > GC, I am OK with that, just do not suggest, that GC is against LRU.
>
> Alright, I'm sorry. I've been unfair to you (and a few others recently).
> I've been unnecessarily grumpy. I tried to explain myself as fairly as
> possible, and Dustin added the words that I apparently forgot already, in
> that these things are better pressed through via SE's.
>
> I get annoyed by these threads because:
>
> - I really don't care for arguments on this level. When I said "GC goes
> against the LRU" I mean that the LRU we have doesn't require GC. The whole
> point of adding the LRU was so we could skip that part. I'm describing
> *intent*, I'm just too tired to keep arguing these things.
>
> - The thread hijacking is seriously annoying. If you want to ping us about
> an ignored patch, start a new thread or necro your own old thread. :(
>
> - Your original e-mail opened with "We run this in single threaded mode
> and the performance is good enough for us so please merge it". I'm pretty
> dumbfounded that people can take a project which is supposed to be the
> performant underpinnings of the entire bloody internet and not do any sort
> of performance testing.
>
> I try to test things and I do have some hardware on hand but I'm still
> trying to find the motivation in myself to do a thorough performance
> run through of the engine branch. There's a lot of stuff going on in
> there. This is time consuming and often frustrating work.
>
> You did make a good attempt at building an efficient implementation, and
> it's a very clever way to go about the business, but best case:
>
> - You're adding logic to the most central global lock
> - You're adding 16 bytes per object
> - Plus some misc memory overhead (minor).
>
> If they're not causing the locks to be problems, the memory efficiency
> drop is an issue for many more people. If we make changes to the memory
> requirements of the default engine, I really only want to entertain ideas
> that make it *drop* requirements (we have some, need to start testing
> them as the engine stuff gets out there).
>
> The big picture is many users have small items, and if we push this change
> many people will suffer.
>
> Yes it's true that once those metrics "expose" an issue you technically
> already have an issue, but it's not an instant dropoff. Easily calculable
> with graphs and things like the "evicted_time" stats. Items dropping off
> the end that haven't been touched in 365,000+ seconds aren't likely to
> cause you a problem tomorrow or even next week, but watch for that number
> to fall. This is also why the evicted and evicted_nonzero stats were
> split. Eviction of an item with a 0 expiration is nearly meaningless.
>
> However, I can't seem to get this through without being rude to people,
> and I apologize for that. I should've responded to your original message
> with these *technical* problems instead of just harping on the idea that
> it looks like you weren't using all of the available statistics properly.
>
> I'm trying to chillax and get back to being a fun (albeit grumpy)
> productive hacker dude. Sorry, all.
>
> -Dormando
>
dormando
Re: Using PCIe SSDs instead of RAM
July 25, 2010 10:20PM
On Sun, 25 Jul 2010, Jakub Łopuszański wrote:

> Thanks for an explanation.
> I see that we have entirely different points of view, probably caused by totally different identified sets of bottlenecks, different
> usage, different configurations etc (I assume that you have greater experience, since my is restricted to one company, with just 55
> memcache machines). For example you often say about the locks and CPU usage, while we observed that (not surprisingly to us) those O(1)
> operations, are relatively insignificant compared to socket operations which take ages. 
>
> I agree that 16 extra bytes is a serious problem though. If I had time I would definitely try to implement a version that uses just 8
> bytes or less (for example by reimplementing TTL buckets as an array of pointers to items hashed by item address). This was just a proof
> of concept, that you can have GC in O(1), which some ppl claimed to be difficult, which turned out to work very well for us at nk.pl.
>
> Sorry for tread hijacking, and all.

It's not hard to make it work, it's hard to make it work for everyone.
There're lots of things that I could add to memcached in a day each, but
it would make it less accessable instead of more accessable at the end of
the day.
Sorry, only registered users may post in this forum.

Click here to login