Welcome! Log In Create A New Profile

Advanced

Moving SSL termination to the edge increased the instance of 502 errors

Posted by Michael Ottoson 
Hi All,

We installed nginx as load balancer/failover in front of two upstream web servers.

At first SSL terminated at the web servers and nginx was configured as TCP passthrough on 443.

We rarely experiences 502s and when it did it was likely due to tuning/tweaking.

About a week ago we moved SSL termination to the edge. Since then we've been getting daily 502s. A small percentage - never reaching 1%. But with ½ million requests per day, we are starting to get complaints.

Stranger: the percentage seems to be rising.

I have more details and a pretty picture here:

https://serverfault.com/questions/885638/moving-ssl-termination-to-the-edge-increased-the-instance-of-502-errors


Any advice how to squash those 502s? Should I be worried nginx is leaking?
_______________________________________________
nginx mailing list
nginx@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx
Hello!

On Wed, Nov 29, 2017 at 04:27:37AM +0000, Michael Ottoson wrote:

> Hi All,
>
> We installed nginx as load balancer/failover in front of two upstream web servers.
>
> At first SSL terminated at the web servers and nginx was configured as TCP passthrough on 443.
>
> We rarely experiences 502s and when it did it was likely due to tuning/tweaking.
>
> About a week ago we moved SSL termination to the edge. Since then we've been getting daily 502s. A small percentage - never reaching 1%. But with ½ million requests per day, we are starting to get complaints.
>
> Stranger: the percentage seems to be rising.
>
> I have more details and a pretty picture here:
>
> https://serverfault.com/questions/885638/moving-ssl-termination-to-the-edge-increased-the-instance-of-502-errors
>
>
> Any advice how to squash those 502s? Should I be worried nginx is leaking?

First of all, you have to find the reason for these 502 errors.
Looking into the error log is a good start.

As per provided serverfault question, you see "no live upstreams"
errors in logs. These errors mean that all configured upstream
servers were disabled due to previous errors (see
http://nginx.org/en/docs/http/ngx_http_upstream_module.html#max_fails),
that is, these errors are just a result of previous errors. You
have to find out real errors, they should be in the error log too.

--
Maxim Dounin
http://mdounin.ru/
_______________________________________________
nginx mailing list
nginx@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx
Thanks, Maxim.

That makes a lot of sense. However, the problem started at exactly the same time we moved SSL termination. There were no changes to the application. It is unlikely to be a mere coincidence - but it could be.

We were previously using HAPROXY for load balancing (well, the company we inherited this from did) and the same happened when they tried moving SSL termination.

There is a reply to my question on serverfault, suggesting increasing keepalives (https://www.nginx.com/blog/load-balancing-with-nginx-plus-part2/#keepalive). This is because moving SSL increases the number of TCP connects. I'll give that a try and report back.

-----Original Message-----
From: nginx [mailto:[email protected]] On Behalf Of Maxim Dounin
Sent: Wednesday, November 29, 2017 7:43 AM
To: nginx@nginx.org
Subject: Re: Moving SSL termination to the edge increased the instance of 502 errors

Hello!

On Wed, Nov 29, 2017 at 04:27:37AM +0000, Michael Ottoson wrote:

> Hi All,
>
> We installed nginx as load balancer/failover in front of two upstream web servers.
>
> At first SSL terminated at the web servers and nginx was configured as TCP passthrough on 443.
>
> We rarely experiences 502s and when it did it was likely due to tuning/tweaking.
>
> About a week ago we moved SSL termination to the edge. Since then we've been getting daily 502s. A small percentage - never reaching 1%. But with ½ million requests per day, we are starting to get complaints.
>
> Stranger: the percentage seems to be rising.
>
> I have more details and a pretty picture here:
>
> https://serverfault.com/questions/885638/moving-ssl-termination-to-the
> -edge-increased-the-instance-of-502-errors
>
>
> Any advice how to squash those 502s? Should I be worried nginx is leaking?

First of all, you have to find the reason for these 502 errors.
Looking into the error log is a good start.

As per provided serverfault question, you see "no live upstreams"
errors in logs. These errors mean that all configured upstream servers were disabled due to previous errors (see http://nginx.org/en/docs/http/ngx_http_upstream_module.html#max_fails),
that is, these errors are just a result of previous errors. You have to find out real errors, they should be in the error log too.

--
Maxim Dounin
http://mdounin.ru/
_______________________________________________
nginx mailing list
nginx@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx
_______________________________________________
nginx mailing list
nginx@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx
What's the backend for this IIS, NGINX, Apache, etc? Is it requiring SNI?
Do you have multiple hostnames?

On Wed, Nov 29, 2017 at 7:05 AM, Michael Ottoson <[email protected]>
wrote:

> Thanks, Maxim.
>
> That makes a lot of sense. However, the problem started at exactly the
> same time we moved SSL termination. There were no changes to the
> application. It is unlikely to be a mere coincidence - but it could be.
>
> We were previously using HAPROXY for load balancing (well, the company we
> inherited this from did) and the same happened when they tried moving SSL
> termination.
>
> There is a reply to my question on serverfault, suggesting increasing
> keepalives (https://www.nginx.com/blog/load-balancing-with-nginx-
> plus-part2/#keepalive). This is because moving SSL increases the number
> of TCP connects. I'll give that a try and report back.
>
> -----Original Message-----
> From: nginx [mailto:[email protected]] On Behalf Of Maxim Dounin
> Sent: Wednesday, November 29, 2017 7:43 AM
> To: nginx@nginx.org
> Subject: Re: Moving SSL termination to the edge increased the instance of
> 502 errors
>
> Hello!
>
> On Wed, Nov 29, 2017 at 04:27:37AM +0000, Michael Ottoson wrote:
>
> > Hi All,
> >
> > We installed nginx as load balancer/failover in front of two upstream
> web servers.
> >
> > At first SSL terminated at the web servers and nginx was configured as
> TCP passthrough on 443.
> >
> > We rarely experiences 502s and when it did it was likely due to
> tuning/tweaking.
> >
> > About a week ago we moved SSL termination to the edge. Since then we've
> been getting daily 502s. A small percentage - never reaching 1%. But with
> ½ million requests per day, we are starting to get complaints.
> >
> > Stranger: the percentage seems to be rising.
> >
> > I have more details and a pretty picture here:
> >
> > https://serverfault.com/questions/885638/moving-ssl-termination-to-the
> > -edge-increased-the-instance-of-502-errors
> >
> >
> > Any advice how to squash those 502s? Should I be worried nginx is
> leaking?
>
> First of all, you have to find the reason for these 502 errors.
> Looking into the error log is a good start.
>
> As per provided serverfault question, you see "no live upstreams"
> errors in logs. These errors mean that all configured upstream servers
> were disabled due to previous errors (see http://nginx.org/en/docs/http/
> ngx_http_upstream_module.html#max_fails),
> that is, these errors are just a result of previous errors. You have to
> find out real errors, they should be in the error log too.
>
> --
> Maxim Dounin
> http://mdounin.ru/
> _______________________________________________
> nginx mailing list
> nginx@nginx.org
> http://mailman.nginx.org/mailman/listinfo/nginx
> _______________________________________________
> nginx mailing list
> nginx@nginx.org
> http://mailman.nginx.org/mailman/listinfo/nginx
>
_______________________________________________
nginx mailing list
nginx@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx
There are many things that *could* cause what you’re seeing - say at least eight. You might be lucky and guess the right one- but probably smarter to see exactly what the issue is.

Presumably you changed your upstream webservers to do this work, replacing ssl with unencrypted connections? Do you have sar data showing #tcp connections before and after the change? Perhaps every request is negotiating SSL now?
What if you add another nginx instance that doesn’t use ssl at all (just as a test) - does that also have 502s?. You probably have data you need to isolate

Sent from my iPhone

> On Nov 29, 2017, at 8:05 AM, Michael Ottoson <[email protected]> wrote:
>
> Thanks, Maxim.
>
> That makes a lot of sense. However, the problem started at exactly the same time we moved SSL termination. There were no changes to the application. It is unlikely to be a mere coincidence - but it could be.
>
> We were previously using HAPROXY for load balancing (well, the company we inherited this from did) and the same happened when they tried moving SSL termination.
>
> There is a reply to my question on serverfault, suggesting increasing keepalives (https://www.nginx.com/blog/load-balancing-with-nginx-plus-part2/#keepalive). This is because moving SSL increases the number of TCP connects. I'll give that a try and report back.
>
> -----Original Message-----
> From: nginx [mailto:[email protected]] On Behalf Of Maxim Dounin
> Sent: Wednesday, November 29, 2017 7:43 AM
> To: nginx@nginx.org
> Subject: Re: Moving SSL termination to the edge increased the instance of 502 errors
>
> Hello!
>
>> On Wed, Nov 29, 2017 at 04:27:37AM +0000, Michael Ottoson wrote:
>>
>> Hi All,
>>
>> We installed nginx as load balancer/failover in front of two upstream web servers.
>>
>> At first SSL terminated at the web servers and nginx was configured as TCP passthrough on 443.
>>
>> We rarely experiences 502s and when it did it was likely due to tuning/tweaking.
>>
>> About a week ago we moved SSL termination to the edge. Since then we've been getting daily 502s. A small percentage - never reaching 1%. But with ½ million requests per day, we are starting to get complaints.
>>
>> Stranger: the percentage seems to be rising.
>>
>> I have more details and a pretty picture here:
>>
>> https://serverfault.com/questions/885638/moving-ssl-termination-to-the
>> -edge-increased-the-instance-of-502-errors
>>
>>
>> Any advice how to squash those 502s? Should I be worried nginx is leaking?
>
> First of all, you have to find the reason for these 502 errors.
> Looking into the error log is a good start.
>
> As per provided serverfault question, you see "no live upstreams"
> errors in logs. These errors mean that all configured upstream servers were disabled due to previous errors (see http://nginx.org/en/docs/http/ngx_http_upstream_module.html#max_fails),
> that is, these errors are just a result of previous errors. You have to find out real errors, they should be in the error log too.
>
> --
> Maxim Dounin
> http://mdounin.ru/
> _______________________________________________
> nginx mailing list
> nginx@nginx.org
> http://mailman.nginx.org/mailman/listinfo/nginx
> _______________________________________________
> nginx mailing list
> nginx@nginx.org
> http://mailman.nginx.org/mailman/listinfo/nginx
_______________________________________________
nginx mailing list
nginx@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx
Since the upstream now has changed tcp ports - do check if it is a
firewall/network buffer etc issue too on the new port.

On Wed, Nov 29, 2017 at 11:42 PM, Peter Booth <[email protected]> wrote:

> There are many things that *could* cause what you’re seeing - say at least
> eight. You might be lucky and guess the right one- but probably smarter to
> see exactly what the issue is.
>
> Presumably you changed your upstream webservers to do this work, replacing
> ssl with unencrypted connections? Do you have sar data showing #tcp
> connections before and after the change? Perhaps every request is
> negotiating SSL now?
> What if you add another nginx instance that doesn’t use ssl at all (just
> as a test) - does that also have 502s?. You probably have data you need to
> isolate
>
> Sent from my iPhone
>
> > On Nov 29, 2017, at 8:05 AM, Michael Ottoson <[email protected]>
> wrote:
> >
> > Thanks, Maxim.
> >
> > That makes a lot of sense. However, the problem started at exactly the
> same time we moved SSL termination. There were no changes to the
> application. It is unlikely to be a mere coincidence - but it could be.
> >
> > We were previously using HAPROXY for load balancing (well, the company
> we inherited this from did) and the same happened when they tried moving
> SSL termination.
> >
> > There is a reply to my question on serverfault, suggesting increasing
> keepalives (https://www.nginx.com/blog/load-balancing-with-nginx-
> plus-part2/#keepalive). This is because moving SSL increases the number
> of TCP connects. I'll give that a try and report back.
> >
> > -----Original Message-----
> > From: nginx [mailto:[email protected]] On Behalf Of Maxim Dounin
> > Sent: Wednesday, November 29, 2017 7:43 AM
> > To: nginx@nginx.org
> > Subject: Re: Moving SSL termination to the edge increased the instance
> of 502 errors
> >
> > Hello!
> >
> >> On Wed, Nov 29, 2017 at 04:27:37AM +0000, Michael Ottoson wrote:
> >>
> >> Hi All,
> >>
> >> We installed nginx as load balancer/failover in front of two upstream
> web servers.
> >>
> >> At first SSL terminated at the web servers and nginx was configured as
> TCP passthrough on 443.
> >>
> >> We rarely experiences 502s and when it did it was likely due to
> tuning/tweaking.
> >>
> >> About a week ago we moved SSL termination to the edge. Since then
> we've been getting daily 502s. A small percentage - never reaching 1%.
> But with ½ million requests per day, we are starting to get complaints.
> >>
> >> Stranger: the percentage seems to be rising.
> >>
> >> I have more details and a pretty picture here:
> >>
> >> https://serverfault.com/questions/885638/moving-ssl-termination-to-the
> >> -edge-increased-the-instance-of-502-errors
> >>
> >>
> >> Any advice how to squash those 502s? Should I be worried nginx is
> leaking?
> >
> > First of all, you have to find the reason for these 502 errors.
> > Looking into the error log is a good start.
> >
> > As per provided serverfault question, you see "no live upstreams"
> > errors in logs. These errors mean that all configured upstream servers
> were disabled due to previous errors (see http://nginx.org/en/docs/http/
> ngx_http_upstream_module.html#max_fails),
> > that is, these errors are just a result of previous errors. You have to
> find out real errors, they should be in the error log too.
> >
> > --
> > Maxim Dounin
> > http://mdounin.ru/
> > _______________________________________________
> > nginx mailing list
> > nginx@nginx.org
> > http://mailman.nginx.org/mailman/listinfo/nginx
> > _______________________________________________
> > nginx mailing list
> > nginx@nginx.org
> > http://mailman.nginx.org/mailman/listinfo/nginx
> _______________________________________________
> nginx mailing list
> nginx@nginx.org
> http://mailman.nginx.org/mailman/listinfo/nginx
>



--
*Anoop P Alias*
_______________________________________________
nginx mailing list
nginx@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx
Sorry, only registered users may post in this forum.

Click here to login