Welcome! Log In Create A New Profile

Advanced

[PATCHES] 3 patches for DNS SRV records

Posted by Baptiste Assmann 
Baptiste Assmann
[PATCHES] 3 patches for DNS SRV records
August 11, 2017 11:20AM
Hi All

So, I enabled latest (brilliant) contribution from Olivier into my
Kubernetes cluster and I discovered it did not work as expected.
After digging into the issues, I found 3 bugs directly related to the
way SRV records must be read and processed by HAProxy.
It was clearly hard to spot them outside a real orchestrator :)

Please find in attachment 3 patches to fix them.

Please note that I might have found an other bug, that I'll dig into
later.
When "scalling in" (reducing an app footprint in kubernetes), HAProxy
considers some servers (pods in kubernetes) in error "no dns
resolution". This is normal. What is not normal is that those servers
never ever come back to live, even when I scale up again...

Note that thanks to (Salut) Fred contribution about server-templates
some time ago, we can do some very cool fancy configurations like the
one below: (I have a headless service called 'red' in my kubernetes, it
points to my 'red' application)

backend red
  server-template red 20 _http._tcp.red.default.svc.cluster.local:8080
inter 1s resolvers kube check

In one line, we can enable automatic "scalling follow-up" in HAProxy.

Baptiste

--
Baptiste Assmann <[email protected]>
Aleksandar Lazic
Re: [PATCHES] 3 patches for DNS SRV records
August 11, 2017 12:40PM
Hi Baptiste,

Baptiste Assmann wrote on 11.08.2017:

> Hi All

> So, I enabled latest (brilliant) contribution from Olivier into my
> Kubernetes cluster and I discovered it did not work as expected.
> After digging into the issues, I found 3 bugs directly related to the
> way SRV records must be read and processed by HAProxy.
> It was clearly hard to spot them outside a real orchestrator :)

> Please find in attachment 3 patches to fix them.

> Please note that I might have found an other bug, that I'll dig into
> later.
> When "scalling in" (reducing an app footprint in kubernetes), HAProxy
> considers some servers (pods in kubernetes) in error "no dns
> resolution". This is normal. What is not normal is that those servers
> never ever come back to live, even when I scale up again...

> Note that thanks to (Salut) Fred contribution about server-templates
> some time ago, we can do some very cool fancy configurations like the
> one below: (I have a headless service called 'red' in my kubernetes, it
> points to my 'red' application)

> backend red
>   server-template red 20 _http._tcp.red.default.svc.cluster.local:8080
> inter 1s resolvers kube check

> In one line, we can enable automatic "scalling follow-up" in HAProxy.
.... for headless services only, right.

For services is the normal resolution enough, imho.

8-O. I don't say the word amazing very often but now it fit's.

That's amazing ;-)

It would be interesting, at least for me, to have haproxy as a
'service controller', instead of the kube-proxy ;-)

Do you also use haproxy for ingress in your kubernetes cluster?

f. e. https://github.com/kubernetes/ingress/tree/master/examples/deployment/haproxy

> Baptiste

--
Best Regards
Aleks
Conrad Hoffmann
Re: [PATCHES] 3 patches for DNS SRV records
August 11, 2017 03:00PM
Hi,

first of all: great to see that this is making progress! I am very excited
about everything related to SRV records and also server-templates. I tested
a fresh master build with these patches applied, here are my observations:

On 08/11/2017 11:10 AM, Baptiste Assmann wrote:
> Hi All
>
> So, I enabled latest (brilliant) contribution from Olivier into my
> Kubernetes cluster and I discovered it did not work as expected.
> After digging into the issues, I found 3 bugs directly related to the
> way SRV records must be read and processed by HAProxy.
> It was clearly hard to spot them outside a real orchestrator :)
>
> Please find in attachment 3 patches to fix them.
>
> Please note that I might have found an other bug, that I'll dig into
> later.
> When "scalling in" (reducing an app footprint in kubernetes), HAProxy
> considers some servers (pods in kubernetes) in error "no dns
> resolution". This is normal. What is not normal is that those servers
> never ever come back to live, even when I scale up again>
> Note that thanks to (Salut) Fred contribution about server-templates
> some time ago, we can do some very cool fancy configurations like the
> one below: (I have a headless service called 'red' in my kubernetes, it
> points to my 'red' application)
>
> backend red
> server-template red 20 _http._tcp.red.default.svc.cluster.local:8080
> inter 1s resolvers kube check
>
> In one line, we can enable automatic "scalling follow-up" in HAProxy.

I tried a very similar setup, like this:

> resolvers servicediscovery
> nameserver dns1 10.33.60.31:53
> nameserver dns2 10.33.19.32:53
> nameserver dns3 10.33.25.28:53
>
> resolve_retries 3
> timeout retry 1s
> hold valid 10s
> hold obsolete 5s
>
> backend testbackend
> server-template test 20 http.web.production.<internal-name>:80 check

This is the first time I am testing the server-template keyword at all, but
I immediately noticed that I sometimes get a rather uneven distribution of
pods, e.g. this (with the name resolving to 5 addresses):

> $ echo "show servers state testbackend" | \
> nc localhost 2305 | grep testbackend | \
> awk '{print $5}' | sort | uniq -c
> 7 10.146.112.130
> 6 10.146.148.92
> 3 10.146.172.225
> 4 10.146.89.208

This uses only four of the five servers, with a quite uneven distribution.
Other attempts do you use all five servers, but the distribution still
seems pretty uneven most of the time. Is that intentional? Is the list
populated randomnly?

Then, nothing changed when I scaled up or down (except the health checks
taking some serves offline), but the addresses were never updated. Is that
the bug you mentioned, or am I doing it wrong?

Also, as more of a side node, we do use SRV records, but not underscores
int the names, which I realize is not very common, but also not exactly
forbidden (as far as I understand the RFC it's more of a suggestion). Would
be great if this could be indicated in some way in the config maybe.

And lastly, I know this isn't going to be solved on a Friday afternoon, but
I'll let you know that our infrastructure has reached a scale where DNS
over UDP almost never cuts it anymore (due to the amount of records
returned), and I think many people who are turning to e.g. Kubernetes do so
because they have to operate at such scale, so my guess is this might be
one of the more frequently requested features at some point :)

These just as "quick" feedback, depending on the time I'll have I'll try to
take a closer look at a few things and provide more details if possible.

Again, thanks a lot for working on this, let me know if you are interested
in any specific details.

Thanks a lot,
Conrad
--
Conrad Hoffmann
Traffic Engineer

SoundCloud Ltd. | Rheinsberger Str. 76/77, 10115 Berlin, Germany

Managing Director: Alexander Ljung | Incorporated in England & Wales
with Company No. 6343600 | Local Branch Office | AG Charlottenburg |
HRB 110657B
Conrad Hoffmann
Re: [PATCHES] 3 patches for DNS SRV records
August 11, 2017 04:00PM
On 08/11/2017 02:56 PM, Conrad Hoffmann wrote:
> Hi,
>
> first of all: great to see that this is making progress! I am very excited
> about everything related to SRV records and also server-templates. I tested
> a fresh master build with these patches applied, here are my observations:
>
> On 08/11/2017 11:10 AM, Baptiste Assmann wrote:
>> Hi All
>>
>> So, I enabled latest (brilliant) contribution from Olivier into my
>> Kubernetes cluster and I discovered it did not work as expected.
>> After digging into the issues, I found 3 bugs directly related to the
>> way SRV records must be read and processed by HAProxy.
>> It was clearly hard to spot them outside a real orchestrator :)
>>
>> Please find in attachment 3 patches to fix them.
>>
>> Please note that I might have found an other bug, that I'll dig into
>> later.
>> When "scalling in" (reducing an app footprint in kubernetes), HAProxy
>> considers some servers (pods in kubernetes) in error "no dns
>> resolution". This is normal. What is not normal is that those servers
>> never ever come back to live, even when I scale up again>
>> Note that thanks to (Salut) Fred contribution about server-templates
>> some time ago, we can do some very cool fancy configurations like the
>> one below: (I have a headless service called 'red' in my kubernetes, it
>> points to my 'red' application)
>>
>> backend red
>> server-template red 20 _http._tcp.red.default.svc.cluster.local:8080
>> inter 1s resolvers kube check
>>
>> In one line, we can enable automatic "scalling follow-up" in HAProxy.
>
> I tried a very similar setup, like this:
>
>> resolvers servicediscovery
>> nameserver dns1 10.33.60.31:53
>> nameserver dns2 10.33.19.32:53
>> nameserver dns3 10.33.25.28:53
>>
>> resolve_retries 3
>> timeout retry 1s
>> hold valid 10s
>> hold obsolete 5s
>>
>> backend testbackend
>> server-template test 20 http.web.production.<internal-name>:80 check
>
> This is the first time I am testing the server-template keyword at all, but
> I immediately noticed that I sometimes get a rather uneven distribution of
> pods, e.g. this (with the name resolving to 5 addresses):
>
>> $ echo "show servers state testbackend" | \
>> nc localhost 2305 | grep testbackend | \
>> awk '{print $5}' | sort | uniq -c
>> 7 10.146.112.130
>> 6 10.146.148.92
>> 3 10.146.172.225
>> 4 10.146.89.208
>
> This uses only four of the five servers, with a quite uneven distribution.
> Other attempts do you use all five servers, but the distribution still
> seems pretty uneven most of the time. Is that intentional? Is the list
> populated randomnly?
>
> Then, nothing changed when I scaled up or down (except the health checks
> taking some serves offline), but the addresses were never updated. Is that
> the bug you mentioned, or am I doing it wrong?'

Ok, I am an idiot. I realized I forgot the `resolvers` keyword for the
`server-template` directive. So scaling and DNS updates actually work now,
which is already amazing. However, the distribution thing is still somewhat
of an issue.


> Also, as more of a side node, we do use SRV records, but not underscores
> int the names, which I realize is not very common, but also not exactly
> forbidden (as far as I understand the RFC it's more of a suggestion). Would
> be great if this could be indicated in some way in the config maybe.
>
> And lastly, I know this isn't going to be solved on a Friday afternoon, but
> I'll let you know that our infrastructure has reached a scale where DNS
> over UDP almost never cuts it anymore (due to the amount of records
> returned), and I think many people who are turning to e.g. Kubernetes do so
> because they have to operate at such scale, so my guess is this might be
> one of the more frequently requested features at some point :)
>
> These just as "quick" feedback, depending on the time I'll have I'll try to
> take a closer look at a few things and provide more details if possible.
>
> Again, thanks a lot for working on this, let me know if you are interested
> in any specific details.
>
> Thanks a lot,
> Conrad
>

Conrad
--
Conrad Hoffmann
Traffic Engineer

SoundCloud Ltd. | Rheinsberger Str. 76/77, 10115 Berlin, Germany

Managing Director: Alexander Ljung | Incorporated in England & Wales
with Company No. 6343600 | Local Branch Office | AG Charlottenburg |
HRB 110657B
Baptiste Assmann
Re: [PATCHES] 3 patches for DNS SRV records
August 11, 2017 06:00PM
Hi Aleksandar,

Thanks for your feedback.

> > In one line, we can enable automatic "scalling follow-up" in
> > HAProxy.
> ... for headless services only, right.

Well I think I've already seen my Kubernetes friend distributing IPs
even for nodePort deployments.

> 8-O. I don't say the word amazing very often but now it fit's.
>
> That's amazing ;-)

Thanks :)
Glad to see our efforts as a team make you happy (and I guess you're
not the only one :) )

> It would be interesting, at least for me, to have haproxy as a 
> 'service controller', instead of the kube-proxy ;-)
>
> Do you also use haproxy for ingress in your kubernetes cluster?

Yes I do.

> f. e.
> https://github.com/kubernetes/ingress/tree/master/examples/deployment
> /haproxy

I don't use that one, I built my own one with a quick'n'dirty python
script, for testing purpose only...
For production purpose, HAProxy Technologies contributes to the haproxy
ingress implementation in kubernetes (the one you linked). This
implementation is based on HAProxy stable and does not take into
account the SRV records yet (should be updated later once HAProxy 1.8.0
will be available).

Baptiste
Baptiste Assmann
Re: [PATCHES] 3 patches for DNS SRV records
August 11, 2017 06:20PM
Hi Conrad,


> first of all: great to see that this is making progress! I am very
> excited
> about everything related to SRV records and also server-templates. I
> tested
> a fresh master build with these patches applied, here are my
> observations:

Thanks a lot for taking time to test and report your findings!


>
> On 08/11/2017 11:10 AM, Baptiste Assmann wrote:
> >
> > Hi All
> >
> > So, I enabled latest (brilliant) contribution from Olivier into my
> > Kubernetes cluster and I discovered it did not work as expected.
> > After digging into the issues, I found 3 bugs directly related to
> > the
> > way SRV records must be read and processed by HAProxy.
> > It was clearly hard to spot them outside a real orchestrator :)
> >
> > Please find in attachment 3 patches to fix them.
> >
> > Please note that I might have found an other bug, that I'll dig
> > into
> > later.
> > When "scalling in" (reducing an app footprint in kubernetes),
> > HAProxy
> > considers some servers (pods in kubernetes) in error "no dns
> > resolution". This is normal. What is not normal is that those
> > servers
> > never ever come back to live, even when I scale up again>
> > Note that thanks to (Salut) Fred contribution about server-
> > templates
> > some time ago, we can do some very cool fancy configurations like
> > the
> > one below: (I have a headless service called 'red' in my
> > kubernetes, it
> > points to my 'red' application)
> >
> > backend red
> >   server-template red 20
> > _http._tcp.red.default.svc.cluster.local:8080
> > inter 1s resolvers kube check
> >
> > In one line, we can enable automatic "scalling follow-up" in
> > HAProxy.
> I tried a very similar setup, like this:
>
> >
> >  resolvers servicediscovery
> >    nameserver dns1 10.33.60.31:53
> >    nameserver dns2 10.33.19.32:53
> >    nameserver dns3 10.33.25.28:53
> >
> >    resolve_retries       3
> >    timeout retry         1s
> >    hold valid           10s
> >    hold obsolete         5s
> >
> >  backend testbackend
> >    server-template test 20 http.web.production.<internal-name>:80
> > check
> This is the first time I am testing the server-template keyword at
> all, but
> I immediately noticed that I sometimes get a rather uneven
> distribution of
> pods, e.g. this (with the name resolving to 5 addresses):
>
> >
> > $ echo "show servers state testbackend" | \
> >    nc localhost 2305 | grep testbackend | \
> >    awk '{print $5}' | sort | uniq -c
> >      7 10.146.112.130
> >      6 10.146.148.92
> >      3 10.146.172.225
> >      4 10.146.89.208
> This uses only four of the five servers, with a quite uneven
> distribution.
> Other attempts do you use all five servers, but the distribution
> still
> seems pretty uneven most of the time. Is that intentional? Is the
> list
> populated randomnly?

Nope, each IP read in the response should be affected to a single
server.
If this IP disapear, then the server will be considered as DOWN after
some time.
If new IPs arrive, then they'll be affected to available servers, or
DOWN servers.

>
> Then, nothing changed when I scaled up or down (except the health
> checks
> taking some serves offline), but the addresses were never updated. Is
> that
> the bug you mentioned, or am I doing it wrong?

Well, you're supposed to see some changes, but as I said in my previous
mail, we seem to have a last bug to fix since some servers go DOWN
during a scale in nevers go up again during the next scale out...


> Also, as more of a side node, we do use SRV records, but not
> underscores
> int the names, which I realize is not very common, but also not
> exactly
> forbidden (as far as I understand the RFC it's more of a suggestion).

I tend to disagree: https://www.ietf.org/rfc/rfc2782.txt
=======8<======
The format of the SRV RR
   Here is the format of the SRV RR, whose DNS type code is 33:
        _Service._Proto.Name TTL Class SRV Priority Weight Port Target
=======8<======


Kubernetes seems to be tolerant and my SRV query type for
_http._tcp.red.default.svc.cluster.local returns the same result with
red.default.svc.cluster.local.
From Kubernetes documentation, it seems that they did first implemented
the version without the underscore first and kept it for compatibility
purpose....
https://kubernetes.io/docs/concepts/services-networking/dns-pod-service
/
=====8<=====
Backwards compatibility
Previous versions of kube-dns made names of the form my-svc.my-
namespace.cluster.local (the ‘svc’ level was added later). This is no
longer supported
=====8<88888

> Would
> be great if this could be indicated in some way in the config maybe.

Well, I don't agree, as explained above :)
That said, technically, this may be doable by playing with the
"resolve-prefer" parameter. For now it accepts only 'ipv4' and 'ipv6',
but we could add 'srv'...
I expect some feedback from the community on this particular point.

> And lastly, I know this isn't going to be solved on a Friday
> afternoon, but
> I'll let you know that our infrastructure has reached a scale where
> DNS
> over UDP almost never cuts it anymore (due to the amount of records
> returned), and I think many people who are turning to e.g. Kubernetes
> do so
> because they have to operate at such scale, so my guess is this might
> be
> one of the more frequently requested features at some point :)

May I ask you how many records your can return at max?

Well, Olivier implemented a "time to leave" for the records in the
cache. I mean that your POD ip must not be seen for "hold obsolete"
period of time before the server associated to it is considered as
DOWN.
So even if the whole set of servers can't stand in a single response,
with some luck, we'll see it often enough to prevent disabling it...

Well, that said, I do agree with you, we may need to implement DNS over
TCP at some point. We just wait some more feedback about this point.

Note that I may implement soon EDNS to announce HAProxy can support
bigger DNS responses. This is not ideal, but may be used as a quick and
dirty workaround until we have something more reliable.


> These just as "quick" feedback, depending on the time I'll have I'll
> try to
> take a closer look at a few things and provide more details if
> possible.
>
> Again, thanks a lot for working on this, let me know if you are
> interested
> in any specific details.

You're welcome. I'm just interested by how many SRV records you could
get at most in a response. This will be very helpfull.

Baptiste
Baptiste
Re: [PATCHES] 3 patches for DNS SRV records
August 22, 2017 05:40PM
Hey all,

We fixed a few bugs in the DNS SRV records and added support for DNS
extension (to announce we can accept a "big" response payload).
It is now safer to run HAProxy 1.8-dev with services discovery tools
(kubernetes, consul, etc...) and get updates using SRV records. Don't
forget to mix this with server-templates, such as:

backend red
server-template red 20 _http._tcp.red.default.svc.cluster.local:8080
resolvers kube inter 1s check resolve-prefer ipv4

Enjoy and report any issues!!!

Baptiste
Sorry, only registered users may post in this forum.

Click here to login