Welcome! Log In Create A New Profile

Advanced

Avoiding Nginx restart when rsyncing cache across machines

Posted by Quintin Par 
Quintin Par
Avoiding Nginx restart when rsyncing cache across machines
September 12, 2018 01:50AM
I run a mini CDN for a static site by having Nginx cache machines (in
different locations) in front of the origin and load balanced by Cloudflare..



Periodically I run rsync pull to update the cache on each of these
machines. Works well, except that I realized I need to restart Nginx and
reload isn’t updating the cache in memory.



Really want to avoid the restart. Is this possible? Or maybe I am doing
something wrong here.

- Quintin
_______________________________________________
nginx mailing list
nginx@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx
Maxim Dounin
Re: Avoiding Nginx restart when rsyncing cache across machines
September 12, 2018 04:50PM
Hello!

On Tue, Sep 11, 2018 at 04:45:42PM -0700, Quintin Par wrote:

> I run a mini CDN for a static site by having Nginx cache machines (in
> different locations) in front of the origin and load balanced by Cloudflare.
>
> Periodically I run rsync pull to update the cache on each of these
> machines. Works well, except that I realized I need to restart Nginx and
> reload isn’t updating the cache in memory.
>
> Really want to avoid the restart. Is this possible? Or maybe I am doing
> something wrong here.

You are not expected to modify cache contents yourself. Doing so
will likely cause various troubles - including not using the new
files placed into the cache after it was loaded from the disk, not
maintaining configured cache max_size and so on.

If you want to control cache contents yourself by syncing data
across machines, you may have better luck by using proxy_store
and normal files instead.

--
Maxim Dounin
http://mdounin.ru/
_______________________________________________
nginx mailing list
nginx@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx
Hi Maxim,



Thank you for this. Opened my eyes.



Not to sounds demanding, but do you have any examples (code) of proxy_store
bring used as a CDN. What’s most important to me in the initial cache
warming. I should be able to start a new machine with 30 GB of cache vs. a
cold start.



Thanks once again.



- Quintin


On Wed, Sep 12, 2018 at 7:46 AM Maxim Dounin <[email protected]> wrote:

> Hello!
>
> On Tue, Sep 11, 2018 at 04:45:42PM -0700, Quintin Par wrote:
>
> > I run a mini CDN for a static site by having Nginx cache machines (in
> > different locations) in front of the origin and load balanced by
> Cloudflare.
> >
> > Periodically I run rsync pull to update the cache on each of these
> > machines. Works well, except that I realized I need to restart Nginx and
> > reload isn’t updating the cache in memory.
> >
> > Really want to avoid the restart. Is this possible? Or maybe I am doing
> > something wrong here.
>
> You are not expected to modify cache contents yourself. Doing so
> will likely cause various troubles - including not using the new
> files placed into the cache after it was loaded from the disk, not
> maintaining configured cache max_size and so on.
>
> If you want to control cache contents yourself by syncing data
> across machines, you may have better luck by using proxy_store
> and normal files instead.
>
> --
> Maxim Dounin
> http://mdounin.ru/
> _______________________________________________
> nginx mailing list
> nginx@nginx.org
> http://mailman.nginx.org/mailman/listinfo/nginx
_______________________________________________
nginx mailing list
nginx@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx
Can I ask, why do you need to start with a warm cache directly? Sure it will lower the requests to the origin, but you could implement a secondary caching layer if you wanted to (using nginx), so you’d have your primary cache in let’s say 10 locations, let's say spread across 3 continents (US, EU, Asia), then you could have a second layer that consist of a smaller amount of locations (1 instance in each continent) - this way you'll warm up faster when you add new servers, and it won't really affect your origin server.

It's a lot more clean also because you're able to use proxy_cache which is really what (in my opinion) you should use when you're building caching proxies.

Generally I'd just slowly warm up new servers prior to putting them into production, get a list of top X files accessed, and loop over them to pull them in as a normal http request.

There's plenty of decent solutions (some more complex than others), but there should really never be a reason to having to sync your cache across machines - even for new servers.

_______________________________________________
nginx mailing list
nginx@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx
Hi Lucas,



The cache is pretty big and I want to limit unnecessary requests if I can.
Cloudflare is in front of my machines and I pay for load balancing,
firewall, Argo among others. So there is a cost per request.



Admittedly I have a not so complex cache architecture. i.e. all cache
machines in front of the origin and it has worked so far. This is also
because I am not that great a programmer/admin :-)



My optimization is not primarily around hits to the origin, but rather
bandwidth and number of requests.



- Quintin


On Wed, Sep 12, 2018 at 1:06 PM Lucas Rolff <[email protected]> wrote:

> Can I ask, why do you need to start with a warm cache directly? Sure it
> will lower the requests to the origin, but you could implement a secondary
> caching layer if you wanted to (using nginx), so you’d have your primary
> cache in let’s say 10 locations, let's say spread across 3 continents (US,
> EU, Asia), then you could have a second layer that consist of a smaller
> amount of locations (1 instance in each continent) - this way you'll warm
> up faster when you add new servers, and it won't really affect your origin
> server.
>
> It's a lot more clean also because you're able to use proxy_cache which is
> really what (in my opinion) you should use when you're building caching
> proxies.
>
> Generally I'd just slowly warm up new servers prior to putting them into
> production, get a list of top X files accessed, and loop over them to pull
> them in as a normal http request.
>
> There's plenty of decent solutions (some more complex than others), but
> there should really never be a reason to having to sync your cache across
> machines - even for new servers.
>
> _______________________________________________
> nginx mailing list
> nginx@nginx.org
> http://mailman.nginx.org/mailman/listinfo/nginx
_______________________________________________
nginx mailing list
nginx@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx
Peter Booth via nginx
Re: Avoiding Nginx restart when rsyncing cache across machines
September 13, 2018 01:40AM
Quintin,

Are most of your requests for dynamic or static content?
Are the requests clustered such that there is a lot of requests for a few (between 5 and 200, say) URLs?
If three different people make same request do they get personalized or identical content returned?
How long are the cached resources valid for?

I have seen layered caches deliver enormous benefit both in terms of performance and ensuring availability- which is usually
synonymous with “protecting teh backend.” That protection was most useful when, for example
I was working on a site that would get mentioned in a tv show at known time of the day every week.
nginx proxy_cache was invaluable at helping the site stay up and responsive when hit with enormous spikes of requests.

This is nuanced, subtle stuff though.

Is your site something that you can disclose publicly?


Peter



> On 12 Sep 2018, at 7:23 PM, Quintin Par <[email protected]> wrote:
>
> Hi Lucas,
>
> The cache is pretty big and I want to limit unnecessary requests if I can. Cloudflare is in front of my machines and I pay for load balancing, firewall, Argo among others. So there is a cost per request.
>
> Admittedly I have a not so complex cache architecture. i.e. all cache machines in front of the origin and it has worked so far. This is also because I am not that great a programmer/admin :-)
>
> My optimization is not primarily around hits to the origin, but rather bandwidth and number of requests.
>
>
> - Quintin
>
>
> On Wed, Sep 12, 2018 at 1:06 PM Lucas Rolff <[email protected] <mailto:[email protected]>> wrote:
> Can I ask, why do you need to start with a warm cache directly? Sure it will lower the requests to the origin, but you could implement a secondary caching layer if you wanted to (using nginx), so you’d have your primary cache in let’s say 10 locations, let's say spread across 3 continents (US, EU, Asia), then you could have a second layer that consist of a smaller amount of locations (1 instance in each continent) - this way you'll warm up faster when you add new servers, and it won't really affect your origin server.
>
> It's a lot more clean also because you're able to use proxy_cache which is really what (in my opinion) you should use when you're building caching proxies.
>
> Generally I'd just slowly warm up new servers prior to putting them into production, get a list of top X files accessed, and loop over them to pull them in as a normal http request.
>
> There's plenty of decent solutions (some more complex than others), but there should really never be a reason to having to sync your cache across machines - even for new servers.
>
> _______________________________________________
> nginx mailing list
> nginx@nginx.org <mailto:[email protected]>
> http://mailman.nginx.org/mailman/listinfo/nginx http://mailman.nginx.org/mailman/listinfo/nginx_______________________________________________
> nginx mailing list
> nginx@nginx.org
> http://mailman.nginx.org/mailman/listinfo/nginx

_______________________________________________
nginx mailing list
nginx@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx
Hi Peter,



Here are my stats for this week: https://imgur.com/a/JloZ37h . The Bypass
is only because I was experimenting with some cache warmer scripts. This is
primarily a static website.

Here’s my URL hit distribution: https://imgur.com/a/DRJUjPc

If three people are making the same request, they get identical content. No
personalization. The pages are cached for 200 days and inactive in
proxy_cache_path set to 60 days.



This is embarrassing but my CDNs are primarily $5 digital ocean machines
across the web with this Nginx cache setup. The server response time
averages at 0.29 seconds. Prior to doing my ghetto CDNing this was at 0.98
seconds. I am pretty proud that I have survived several Slashdot effects on
the $5 machines serving cached content peaking at 2500 requests/second
without any issues.



Since this is working well, I don’t want to do any layered caching, unless
there is a compelling reason.

- Quintin


On Wed, Sep 12, 2018 at 4:32 PM Peter Booth via nginx <[email protected]>
wrote:

> Quintin,
>
> Are most of your requests for dynamic or static content?
> Are the requests clustered such that there is a lot of requests for a few
> (between 5 and 200, say) URLs?
> If three different people make same request do they get personalized or
> identical content returned?
> How long are the cached resources valid for?
>
> I have seen layered caches deliver enormous benefit both in terms of
> performance and ensuring availability- which is usually
> synonymous with “protecting teh backend.” That protection was most useful
> when, for example
> I was working on a site that would get mentioned in a tv show at known
> time of the day every week.
> nginx proxy_cache was invaluable at helping the site stay up and
> responsive when hit with enormous spikes of requests.
>
> This is nuanced, subtle stuff though.
>
> Is your site something that you can disclose publicly?
>
>
> Peter
>
>
>
> On 12 Sep 2018, at 7:23 PM, Quintin Par <[email protected]> wrote:
>
> Hi Lucas,
>
>
> The cache is pretty big and I want to limit unnecessary requests if I can..
> Cloudflare is in front of my machines and I pay for load balancing,
> firewall, Argo among others. So there is a cost per request.
>
>
> Admittedly I have a not so complex cache architecture. i.e. all cache
> machines in front of the origin and it has worked so far. This is also
> because I am not that great a programmer/admin :-)
>
>
> My optimization is not primarily around hits to the origin, but rather
> bandwidth and number of requests.
>
>
>
> - Quintin
>
>
> On Wed, Sep 12, 2018 at 1:06 PM Lucas Rolff <[email protected]> wrote:
>
>> Can I ask, why do you need to start with a warm cache directly? Sure it
>> will lower the requests to the origin, but you could implement a secondary
>> caching layer if you wanted to (using nginx), so you’d have your primary
>> cache in let’s say 10 locations, let's say spread across 3 continents (US,
>> EU, Asia), then you could have a second layer that consist of a smaller
>> amount of locations (1 instance in each continent) - this way you'll warm
>> up faster when you add new servers, and it won't really affect your origin
>> server.
>>
>> It's a lot more clean also because you're able to use proxy_cache which
>> is really what (in my opinion) you should use when you're building caching
>> proxies.
>>
>> Generally I'd just slowly warm up new servers prior to putting them into
>> production, get a list of top X files accessed, and loop over them to pull
>> them in as a normal http request.
>>
>> There's plenty of decent solutions (some more complex than others), but
>> there should really never be a reason to having to sync your cache across
>> machines - even for new servers.
>>
>> _______________________________________________
>> nginx mailing list
>> nginx@nginx.org
>> http://mailman.nginx.org/mailman/listinfo/nginx
>
> _______________________________________________
> nginx mailing list
> nginx@nginx.org
> http://mailman.nginx.org/mailman/listinfo/nginx
>
>
> _______________________________________________
> nginx mailing list
> nginx@nginx.org
> http://mailman.nginx.org/mailman/listinfo/nginx
_______________________________________________
nginx mailing list
nginx@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx
> The cache is pretty big and I want to limit unnecessary requests if I can.

30gb of cache and ~ 400k hits isn’t a lot.

> Cloudflare is in front of my machines and I pay for load balancing, firewall, Argo among others. So there is a cost per request.

Doesn’t matter if you pay for load balancing, firewall, argo etc – implementing a secondary caching layer won’t increase your costs on the CloudFlare side of things, because you’re not communicating via CloudFlare but rather between machines – you’d connect your X amount of locations to a smaller amount of locations, doing direct traffic between your DigitalOcean instances – so no CloudFlare costs involved.

Communication between your CDN servers and your origin server also (IMO) shouldn’t go via any CloudFlare related products, so additional hits on the origin will be “free” in the expense of a bit higher load – however since it would be only a subset of locations that would request via the origin, and they then serve as the origin for your other servers – you’re effectively decreasing the origin traffic.

You should easily be able to get a 97-99% offload of your origin (in my own setup, it’s at 99.95% at this point), even without using a secondary layer, and performance can get improved by using stuff such as:

http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_background_update

http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_use_stale_updating

Nginx is smart enough to do a sub-request in the background to check if the origin request updated (using modified or etags e.g) – this way the origin communication would be little anyway.

The only Load Balancer / Argo / Firewall costs you should have is the “CDN Server -> end user” traffic, and that won’t increase or decrease by doing a normal proxy_cache setup or a setup with a secondary cache layer.

You also won’t increase costs by doing a warmup of your CDN servers – you could do something as simple as:

curl -o /dev/null -k -I --resolve cdn.yourdomain.com:80:127.0.0.1 https://cdn.yourdomain.com/img/logo.png

You could do the same with python or another language if you’re feeling more comfortable there.

However using a method like above, will result in your warmup being kept “local”, since you’re resolving the cdn.yourdomain.com to localhost, requests that are not yet cached will use whatever is configured in your proxy_pass in the nginx config.

> Admittedly I have a not so complex cache architecture. i.e. all cache machines in front of the origin and it has worked so far

I would say it’s complex if you have to sync your content – many pull based CDN’s simply do a normal proxy_cache + proxy_pass setup, not syncing content, and then using some of the nifty features (such as proxy_cache_background_update and proxy_cache_use_stale_updating) to decrease the origin traffic, or possibly implementing a secondary layer if they’re still doing a lot of origin traffic (e.g. because of having a lot of “edge servers”) – if you’re like 10 servers, I wouldn’t even consider a secondary layer unless your origin is under heavy load and can’t handle 10 possible clients (CDN Servers).

Best Regards,
Lucas Rolff


_______________________________________________
nginx mailing list
nginx@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx
Maxim Dounin
Re: Avoiding Nginx restart when rsyncing cache across machines
September 13, 2018 12:50PM
Hello!

On Wed, Sep 12, 2018 at 12:41:15PM -0700, Quintin Par wrote:

> Not to sounds demanding, but do you have any examples (code) of proxy_store
> bring used as a CDN. What’s most important to me in the initial cache
> warming. I should be able to start a new machine with 30 GB of cache vs. a
> cold start.

Simple examples of using proxy_store can be found in the
documentation, see here:

http://nginx.org/r/proxy_store

It usually works well when you need to mirror static files which
are never changed. Note though that if you need to implement cache
experiation, or need to preserve custom response headers, this
might be a challenge.

--
Maxim Dounin
http://mdounin.ru/
_______________________________________________
nginx mailing list
nginx@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx
Hi Lucas,



Thank you for this. GEM all over. I didn’t know curl had –resolve.



This is a more a generic question: How does one ensure cache consistency on
all edges? Do people resort to a combination of expiry + background update
+ stale responding? What if one edge and the origin was updated to the
latest and I now want all the other 1000 edges updates within a minute but
the content expiry is 100 days.

- Quintin


On Wed, Sep 12, 2018 at 11:39 PM Lucas Rolff <[email protected]> wrote:

> > The cache is pretty big and I want to limit unnecessary requests if I
> can.
>
> 30gb of cache and ~ 400k hits isn’t a lot.
>
> > Cloudflare is in front of my machines and I pay for load balancing,
> firewall, Argo among others. So there is a cost per request.
>
> Doesn’t matter if you pay for load balancing, firewall, argo etc –
> implementing a secondary caching layer won’t increase your costs on the
> CloudFlare side of things, because you’re not communicating via CloudFlare
> but rather between machines – you’d connect your X amount of locations to a
> smaller amount of locations, doing direct traffic between your DigitalOcean
> instances – so no CloudFlare costs involved.
>
> Communication between your CDN servers and your origin server also (IMO)
> shouldn’t go via any CloudFlare related products, so additional hits on the
> origin will be “free” in the expense of a bit higher load – however since
> it would be only a subset of locations that would request via the origin,
> and they then serve as the origin for your other servers – you’re
> effectively decreasing the origin traffic.
>
> You should easily be able to get a 97-99% offload of your origin (in my
> own setup, it’s at 99.95% at this point), even without using a secondary
> layer, and performance can get improved by using stuff such as:
>
>
> http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_background_update
>
>
> http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_use_stale_updating
>
> Nginx is smart enough to do a sub-request in the background to check if
> the origin request updated (using modified or etags e.g) – this way the
> origin communication would be little anyway.
>
> The only Load Balancer / Argo / Firewall costs you should have is the “CDN
> Server -> end user” traffic, and that won’t increase or decrease by doing a
> normal proxy_cache setup or a setup with a secondary cache layer.
>
> You also won’t increase costs by doing a warmup of your CDN servers – you
> could do something as simple as:
>
> curl -o /dev/null -k -I --resolve cdn.yourdomain.com:80:127.0.0.1
> https://cdn.yourdomain.com/img/logo.png
>
> You could do the same with python or another language if you’re feeling
> more comfortable there.
>
> However using a method like above, will result in your warmup being kept
> “local”, since you’re resolving the cdn.yourdomain.com to localhost,
> requests that are not yet cached will use whatever is configured in your
> proxy_pass in the nginx config.
>
> > Admittedly I have a not so complex cache architecture. i.e. all cache
> machines in front of the origin and it has worked so far
>
> I would say it’s complex if you have to sync your content – many pull
> based CDN’s simply do a normal proxy_cache + proxy_pass setup, not syncing
> content, and then using some of the nifty features (such as
> proxy_cache_background_update and proxy_cache_use_stale_updating) to
> decrease the origin traffic, or possibly implementing a secondary layer if
> they’re still doing a lot of origin traffic (e.g. because of having a lot
> of “edge servers”) – if you’re like 10 servers, I wouldn’t even consider a
> secondary layer unless your origin is under heavy load and can’t handle 10
> possible clients (CDN Servers).
>
> Best Regards,
> Lucas Rolff
>
>
> _______________________________________________
> nginx mailing list
> nginx@nginx.org
> http://mailman.nginx.org/mailman/listinfo/nginx
_______________________________________________
nginx mailing list
nginx@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx
> How does one ensure cache consistency on all edges?

I wouldn't - you can never really rely on anything being consistent cached, there will always be stuff that doesn't follow the standards and thus can give an inconsistent state for one or more users.

What I'd do, would simply to be to purge the files whenever needed (and possibly warm them up if you want them to be "hot" when visitors arrive), sure the first 1-2 visitors in each location might have a bit slower request, but that's about it.

Alternatively you could just put a super low cache-control, when you're using proxy_cache_background_update and proxy_cache_use_stale_updating, nginx will ask the origin server if the file has changed - so if it haven't you'll simply get a 304 from the origin (if the origin supports it) - so you'll do more requests to the origin, but traffic will be minimal because it just returns 304 not modified (plus some more headers).

Best Regards,
Lucas Rolff


_______________________________________________
nginx mailing list
nginx@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx
Peter Booth via nginx
Re: Avoiding Nginx restart when rsyncing cache across machines
September 14, 2018 03:30AM
One more approach is to not change the contents of resources without also changing their name. One example would be the cache_key feature in Rails, where resources have a path based on some ID and their updated_at value. Whenever you modify a resource it automatically expires.

Sent from my iPhone

On Sep 13, 2018, at 4:03 PM, Lucas Rolff <[email protected]> wrote:

>> How does one ensure cache consistency on all edges?
>
> I wouldn't - you can never really rely on anything being consistent cached, there will always be stuff that doesn't follow the standards and thus can give an inconsistent state for one or more users.
>
> What I'd do, would simply to be to purge the files whenever needed (and possibly warm them up if you want them to be "hot" when visitors arrive), sure the first 1-2 visitors in each location might have a bit slower request, but that's about it.
>
> Alternatively you could just put a super low cache-control, when you're using proxy_cache_background_update and proxy_cache_use_stale_updating, nginx will ask the origin server if the file has changed - so if it haven't you'll simply get a 304 from the origin (if the origin supports it) - so you'll do more requests to the origin, but traffic will be minimal because it just returns 304 not modified (plus some more headers).
>
> Best Regards,
> Lucas Rolff
>
>
> _______________________________________________
> nginx mailing list
> nginx@nginx.org
> http://mailman.nginx.org/mailman/listinfo/nginx
_______________________________________________
nginx mailing list
nginx@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx
It is fairly simple to hack nginx and use Lua to reload the cache timed or
via a request.
The code is already there, its just a matter of calling it again.

Posted at Nginx Forum: https://forum.nginx.org/read.php?2,281179,281225#msg-281225

_______________________________________________
nginx mailing list
nginx@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx
Saint Michael
Question
September 14, 2018 02:10PM
>
> I am a new developer and need to publish several database tables with
> relationship one to many, etc,. What web framework is fastest to learn ? I
> am looking at Mojolicios or Catalyst, but don´t know if they are necessary
> or not. For a new project, what parts would you choose? I have read the
> Nginx is the best application server, don´t know if I need anything else.
_______________________________________________
nginx mailing list
nginx@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx
Sorry, only registered users may post in this forum.

Click here to login