Welcome! Log In Create A New Profile

Advanced

Server-template and randomized DNS responses

Posted by Чепайкин Михаил 
Чепайкин Михаил
Server-template and randomized DNS responses
February 07, 2018 03:00PM
Hi!

I have a Consul as service discovery tool and HAProxy as load balancer.

In Consul registered a service running on a number of servers, and this
service can be scaled by adding and removing nodes and by moving nodes from
one server to another.

Consul has DNS service which randomizes responses for services like that:

[bux] [email protected]:~$ dig +short mfm-monitor-opentsdb.service.mfmconsul
10.182.161.239
10.182.161.152
10.182.161.240
10.182.161.92
[bux] [email protected]:~$ dig +short mfm-monitor-opentsdb.service.mfmconsul
10.182.161.92
10.182.161.152
10.182.161.240
10.182.161.239

In HAProxy 1.8.3 im using server-template configuration, like that:

resolvers dns
nameserver dns1 ${HAPROXY_NAMESERVER}
hold valid 2s

backend tsdb_backend_query
server-template tsdb_query 5
mfm-monitor-opentsdb.service.mfmconsul:4242 check resolvers dns inter
1000

And in that case I get alot of warinings in haproxy log:

time="2018-02-02T15:44:32+03:00" level=info msg="[WARNING] 032/154432
(32983) : tsdb_backend_query/tsdb_query1 changed its IP from
10.182.161.240 to 10.182.161.239 by DNS cache."
job=mfm-monitor-haproxy pid=32983
time="2018-02-02T15:44:42+03:00" level=info msg="[WARNING] 032/154442
(32983) : tsdb_backend_query/tsdb_query1 changed its IP from
10.182.161.239 to 10.182.161.240 by DNS cache."
job=mfm-monitor-haproxy pid=32983
time="2018-02-02T15:44:46+03:00" level=info msg="[WARNING] 032/154446
(32983) : tsdb_backend_query/tsdb_query3 changed its IP from
10.182.161.152 to 10.182.161.239 by DNS cache."
job=mfm-monitor-haproxy pid=32983
time="2018-02-02T15:44:50+03:00" level=info msg="[WARNING] 032/154450
(32983) : tsdb_backend_query/tsdb_query2 changed its IP from
10.182.161.92 to 10.182.161.152 by DNS cache." job=mfm-monitor-haproxy
pid=32983
time="2018-02-02T15:44:52+03:00" level=info msg="[WARNING] 032/154452
(32983) : tsdb_backend_query/tsdb_query3 changed its IP from
10.182.161.239 to 10.182.161.92 by DNS cache." job=mfm-monitor-haproxy
pid=32983
time="2018-02-02T15:44:56+03:00" level=info msg="[WARNING] 032/154456
(32983) : tsdb_backend_query/tsdb_query1 changed its IP from
10.182.161.240 to 10.182.161.239 by DNS cache."
job=mfm-monitor-haproxy pid=32983
time="2018-02-02T15:45:00+03:00" level=info msg="[WARNING] 032/154500
(32983) : tsdb_backend_query/tsdb_query3 changed its IP from
10.182.161.92 to 10.182.161.240 by DNS cache." job=mfm-monitor-haproxy
pid=32983
time="2018-02-02T15:45:02+03:00" level=info msg="[WARNING] 032/154502
(32983) : tsdb_backend_query/tsdb_query3 changed its IP from
10.182.161.240 to 10.182.161.92 by DNS cache." job=mfm-monitor-haproxy
pid=32983
time="2018-02-02T15:45:04+03:00" level=info msg="[WARNING] 032/154504
(32983) : tsdb_backend_query/tsdb_query2 changed its IP from
10.182.161.152 to 10.182.161.240 by DNS cache."
job=mfm-monitor-haproxy pid=32983
time="2018-02-02T15:45:06+03:00" level=info msg="[WARNING] 032/154506
(32983) : tsdb_backend_query/tsdb_query1 changed its IP from
10.182.161.239 to 10.182.161.152 by DNS cache."
job=mfm-monitor-haproxy pid=32983
time="2018-02-02T15:45:10+03:00" level=info msg="[WARNING] 032/154510
(32983) : tsdb_backend_query/tsdb_query3 changed its IP from
10.182.161.92 to 10.182.161.239 by DNS cache." job=mfm-monitor-haproxy
pid=32983
time="2018-02-02T15:45:18+03:00" level=info msg="[WARNING] 032/154518
(32983) : tsdb_backend_query/tsdb_query3 changed its IP from
10.182.161.239 to 10.182.161.92 by DNS cache." job=mfm-monitor-haproxy
pid=32983
time="2018-02-02T15:45:20+03:00" level=info msg="[WARNING] 032/154520
(32983) : tsdb_backend_query/tsdb_query2 changed its IP from
10.182.161.240 to 10.182.161.239 by DNS cache."
job=mfm-monitor-haproxy pid=32983

This isn’t really break the service, but I think this is not quite normal.

Any advise on how to resolve this issue?
--
Mike Chepaykin
Baptiste
Re: Server-template and randomized DNS responses
February 07, 2018 11:30PM
Hi

You're not using SRV records and that may be the root cause of your issue.
Please try something like this:

backend tsdb_backend_query
server-template tsdb_query 5
_mfm-monitor-opentsdb._tcp.service.mfmconsul:4242 check resolvers dns
inter 1000

if "mfm-monitor-opentsdb" is your service name in consul.

Baptiste



On Wed, Feb 7, 2018 at 2:52 PM, Чепайкин Михаил <[email protected]>
wrote:

> Hi!
>
> I have a Consul as service discovery tool and HAProxy as load balancer.
>
> In Consul registered a service running on a number of servers, and this
> service can be scaled by adding and removing nodes and by moving nodes from
> one server to another.
>
> Consul has DNS service which randomizes responses for services like that:
>
> [bux] [email protected]:~$ dig +short mfm-monitor-opentsdb.service.mfmconsul
> 10.182.161.239
> 10.182.161.152
> 10.182.161.240
> 10.182.161.92
> [bux] [email protected]:~$ dig +short mfm-monitor-opentsdb.service.mfmconsul
> 10.182.161.92
> 10.182.161.152
> 10.182.161.240
> 10.182.161.239
>
> In HAProxy 1.8.3 im using server-template configuration, like that:
>
> resolvers dns
> nameserver dns1 ${HAPROXY_NAMESERVER}
> hold valid 2s
>
> backend tsdb_backend_query
> server-template tsdb_query 5 mfm-monitor-opentsdb.service.mfmconsul:4242 check resolvers dns inter 1000
>
> And in that case I get alot of warinings in haproxy log:
>
> time="2018-02-02T15:44:32+03:00" level=info msg="[WARNING] 032/154432 (32983) : tsdb_backend_query/tsdb_query1 changed its IP from 10.182.161.240 to 10.182.161.239 by DNS cache." job=mfm-monitor-haproxy pid=32983
> time="2018-02-02T15:44:42+03:00" level=info msg="[WARNING] 032/154442 (32983) : tsdb_backend_query/tsdb_query1 changed its IP from 10.182.161.239 to 10.182.161.240 by DNS cache." job=mfm-monitor-haproxy pid=32983
> time="2018-02-02T15:44:46+03:00" level=info msg="[WARNING] 032/154446 (32983) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182.161.152 to 10.182.161.239 by DNS cache." job=mfm-monitor-haproxy pid=32983
> time="2018-02-02T15:44:50+03:00" level=info msg="[WARNING] 032/154450 (32983) : tsdb_backend_query/tsdb_query2 changed its IP from 10.182.161.92 to 10.182.161.152 by DNS cache." job=mfm-monitor-haproxy pid=32983
> time="2018-02-02T15:44:52+03:00" level=info msg="[WARNING] 032/154452 (32983) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182.161.239 to 10.182.161.92 by DNS cache." job=mfm-monitor-haproxy pid=32983
> time="2018-02-02T15:44:56+03:00" level=info msg="[WARNING] 032/154456 (32983) : tsdb_backend_query/tsdb_query1 changed its IP from 10.182.161.240 to 10.182.161.239 by DNS cache." job=mfm-monitor-haproxy pid=32983
> time="2018-02-02T15:45:00+03:00" level=info msg="[WARNING] 032/154500 (32983) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182.161.92 to 10.182.161.240 by DNS cache." job=mfm-monitor-haproxy pid=32983
> time="2018-02-02T15:45:02+03:00" level=info msg="[WARNING] 032/154502 (32983) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182.161.240 to 10.182.161.92 by DNS cache." job=mfm-monitor-haproxy pid=32983
> time="2018-02-02T15:45:04+03:00" level=info msg="[WARNING] 032/154504 (32983) : tsdb_backend_query/tsdb_query2 changed its IP from 10.182.161.152 to 10.182.161.240 by DNS cache." job=mfm-monitor-haproxy pid=32983
> time="2018-02-02T15:45:06+03:00" level=info msg="[WARNING] 032/154506 (32983) : tsdb_backend_query/tsdb_query1 changed its IP from 10.182.161.239 to 10.182.161.152 by DNS cache." job=mfm-monitor-haproxy pid=32983
> time="2018-02-02T15:45:10+03:00" level=info msg="[WARNING] 032/154510 (32983) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182.161.92 to 10.182.161.239 by DNS cache." job=mfm-monitor-haproxy pid=32983
> time="2018-02-02T15:45:18+03:00" level=info msg="[WARNING] 032/154518 (32983) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182.161.239 to 10.182.161.92 by DNS cache." job=mfm-monitor-haproxy pid=32983
> time="2018-02-02T15:45:20+03:00" level=info msg="[WARNING] 032/154520 (32983) : tsdb_backend_query/tsdb_query2 changed its IP from 10.182.161.240 to 10.182.161.239 by DNS cache." job=mfm-monitor-haproxy pid=32983
>
> This isn’t really break the service, but I think this is not quite normal.
>
> Any advise on how to resolve this issue?
> --
> Mike Chepaykin
>
>
Чепайкин Михаил
Re: Server-template and randomized DNS responses
February 08, 2018 12:40AM
Hi

I’ve changed configuration as you suggested:

backend tsdb_backend_query
server-template tsdb_query 5
_mfm-monitor-opentsdb._tcp.service.mfmconsul:4242 check resolvers dns
inter 1000

Logs are kinda different - backend servers now go UP and DOWN, but seems
the same - ip addresses changing in the same way:

time="2018-02-08T02:12:53+03:00" level=info msg="[WARNING] 038/021253
(18208) : Server tsdb_backend_query/tsdb_query1 is going DOWN for
maintenance (No IP for server ). 2 active and 0 backup servers left. 0
sessions active, 0 requeued, 0 remaining in queue."
job=mfm-monitor-haproxy pid=18208
time="2018-02-08T02:12:53+03:00" level=info msg="[WARNING] 038/021253
(18208) : tsdb_backend_query/tsdb_query1 changed its IP from
10.182.161.223 to 10.182.161.211 by DNS cache."
job=mfm-monitor-haproxy pid=18208
time="2018-02-08T02:12:53+03:00" level=info msg="[WARNING] 038/021253
(18208) : Server tsdb_backend_query/tsdb_query1 administratively READY
thanks to valid DNS answer." job=mfm-monitor-haproxy pid=18208
time="2018-02-08T02:12:53+03:00" level=info msg="[WARNING] 038/021253
(18208) : Server tsdb_backend_query/tsdb_query1
('0ab6a1d3.addr.dc1.mfmconsul') is UP/READY (resolves again)."
job=mfm-monitor-haproxy pid=18208
time="2018-02-08T02:12:55+03:00" level=info msg="[WARNING] 038/021255
(18208) : Server tsdb_backend_query/tsdb_query3 is going DOWN for
maintenance (No IP for server ). 2 active and 0 backup servers left. 0
sessions active, 0 requeued, 0 remaining in queue."
job=mfm-monitor-haproxy pid=18208
time="2018-02-08T02:12:55+03:00" level=info msg="[WARNING] 038/021255
(18208) : tsdb_backend_query/tsdb_query3 changed its IP from
10.182.161.98 to 10.182.161.223 by DNS cache." job=mfm-monitor-haproxy
pid=18208
time="2018-02-08T02:12:55+03:00" level=info msg="[WARNING] 038/021255
(18208) : Server tsdb_backend_query/tsdb_query3 administratively READY
thanks to valid DNS answer." job=mfm-monitor-haproxy pid=18208
time="2018-02-08T02:12:55+03:00" level=info msg="[WARNING] 038/021255
(18208) : Server tsdb_backend_query/tsdb_query3
('0ab6a1df.addr.dc1.mfmconsul') is UP/READY (resolves again)."
job=mfm-monitor-haproxy pid=18208
time="2018-02-08T02:12:57+03:00" level=info msg="[WARNING] 038/021257
(18208) : Server tsdb_backend_query/tsdb_query3 is going DOWN for
maintenance (No IP for server ). 2 active and 0 backup servers left. 0
sessions active, 0 requeued, 0 remaining in queue."
job=mfm-monitor-haproxy pid=18208
time="2018-02-08T02:12:57+03:00" level=info msg="[WARNING] 038/021257
(18208) : tsdb_backend_query/tsdb_query3 changed its IP from
10.182.161.223 to 10.182.161.98 by DNS cache." job=mfm-monitor-haproxy
pid=18208
time="2018-02-08T02:12:57+03:00" level=info msg="[WARNING] 038/021257
(18208) : Server tsdb_backend_query/tsdb_query3 administratively READY
thanks to valid DNS answer." job=mfm-monitor-haproxy pid=18208
time="2018-02-08T02:12:57+03:00" level=info msg="[WARNING] 038/021257
(18208) : Server tsdb_backend_query/tsdb_query3
('0ab6a162.addr.dc1.mfmconsul') is UP/READY (resolves again)."
job=mfm-monitor-haproxy pid=18208
time="2018-02-08T02:13:01+03:00" level=info msg="[WARNING] 038/021301
(18208) : Server tsdb_backend_query/tsdb_query1 is going DOWN for
maintenance (No IP for server ). 2 active and 0 backup servers left. 0
sessions active, 0 requeued, 0 remaining in queue."
job=mfm-monitor-haproxy pid=18208
time="2018-02-08T02:13:01+03:00" level=info msg="[WARNING] 038/021301
(18208) : tsdb_backend_query/tsdb_query1 changed its IP from
10.182.161.211 to 10.182.161.223 by DNS cache."
job=mfm-monitor-haproxy pid=18208
time="2018-02-08T02:13:01+03:00" level=info msg="[WARNING] 038/021301
(18208) : Server tsdb_backend_query/tsdb_query1 administratively READY
thanks to valid DNS answer." job=mfm-monitor-haproxy pid=18208
time="2018-02-08T02:13:01+03:00" level=info msg="[WARNING] 038/021301
(18208) : Server tsdb_backend_query/tsdb_query1
('0ab6a1df.addr.dc1.mfmconsul') is UP/READY (resolves again)."
job=mfm-monitor-haproxy pid=18208
time="2018-02-08T02:13:05+03:00" level=info msg="[WARNING] 038/021305
(18208) : Server tsdb_backend_query/tsdb_query2 is going DOWN for
maintenance (No IP for server ). 2 active and 0 backup servers left. 0
sessions active, 0 requeued, 0 remaining in queue."
job=mfm-monitor-haproxy pid=18208
time="2018-02-08T02:13:05+03:00" level=info msg="[WARNING] 038/021305
(18208) : tsdb_backend_query/tsdb_query2 changed its IP from
10.182.161.163 to 10.182.161.211 by DNS cache."
job=mfm-monitor-haproxy pid=18208

Any thoughts?

On 8 February 2018 at 01:25, Baptiste <[email protected]> wrote:

Hi
>
> You're not using SRV records and that may be the root cause of your issue..
> Please try something like this:
>
> backend tsdb_backend_query
> server-template tsdb_query 5 _mfm-monitor-opentsdb._tcp.service.mfmconsul:4242 check resolvers dns inter 1000
>
> if "mfm-monitor-opentsdb" is your service name in consul.
>
> Baptiste
>
>
>
> On Wed, Feb 7, 2018 at 2:52 PM, Чепайкин Михаил <[email protected]>
> wrote:
>
>> Hi!
>>
>> I have a Consul as service discovery tool and HAProxy as load balancer.
>>
>> In Consul registered a service running on a number of servers, and this
>> service can be scaled by adding and removing nodes and by moving nodes from
>> one server to another.
>>
>> Consul has DNS service which randomizes responses for services like that:
>>
>> [bux] [email protected]:~$ dig +short mfm-monitor-opentsdb.service.mfmconsul
>> 10.182.161.239
>> 10.182.161.152
>> 10.182.161.240
>> 10.182.161.92
>> [bux] [email protected]:~$ dig +short mfm-monitor-opentsdb.service.mfmconsul
>> 10.182.161.92
>> 10.182.161.152
>> 10.182.161.240
>> 10.182.161.239
>>
>> In HAProxy 1.8.3 im using server-template configuration, like that:
>>
>> resolvers dns
>> nameserver dns1 ${HAPROXY_NAMESERVER}
>> hold valid 2s
>>
>> backend tsdb_backend_query
>> server-template tsdb_query 5 mfm-monitor-opentsdb.service.mfmconsul:4242 check resolvers dns inter 1000
>>
>> And in that case I get alot of warinings in haproxy log:
>>
>> time="2018-02-02T15:44:32+03:00" level=info msg="[WARNING] 032/154432 (32983) : tsdb_backend_query/tsdb_query1 changed its IP from 10.182.161..240 to 10.182.161.239 by DNS cache." job=mfm-monitor-haproxy pid=32983
>> time="2018-02-02T15:44:42+03:00" level=info msg="[WARNING] 032/154442 (32983) : tsdb_backend_query/tsdb_query1 changed its IP from 10.182.161..239 to 10.182.161.240 by DNS cache." job=mfm-monitor-haproxy pid=32983
>> time="2018-02-02T15:44:46+03:00" level=info msg="[WARNING] 032/154446 (32983) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182.161..152 to 10.182.161.239 by DNS cache." job=mfm-monitor-haproxy pid=32983
>> time="2018-02-02T15:44:50+03:00" level=info msg="[WARNING] 032/154450 (32983) : tsdb_backend_query/tsdb_query2 changed its IP from 10.182.161..92 to 10.182.161.152 by DNS cache." job=mfm-monitor-haproxy pid=32983
>> time="2018-02-02T15:44:52+03:00" level=info msg="[WARNING] 032/154452 (32983) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182.161..239 to 10.182.161.92 by DNS cache." job=mfm-monitor-haproxy pid=32983
>> time="2018-02-02T15:44:56+03:00" level=info msg="[WARNING] 032/154456 (32983) : tsdb_backend_query/tsdb_query1 changed its IP from 10.182.161..240 to 10.182.161.239 by DNS cache." job=mfm-monitor-haproxy pid=32983
>> time="2018-02-02T15:45:00+03:00" level=info msg="[WARNING] 032/154500 (32983) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182.161..92 to 10.182.161.240 by DNS cache." job=mfm-monitor-haproxy pid=32983
>> time="2018-02-02T15:45:02+03:00" level=info msg="[WARNING] 032/154502 (32983) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182.161..240 to 10.182.161.92 by DNS cache." job=mfm-monitor-haproxy pid=32983
>> time="2018-02-02T15:45:04+03:00" level=info msg="[WARNING] 032/154504 (32983) : tsdb_backend_query/tsdb_query2 changed its IP from 10.182.161..152 to 10.182.161.240 by DNS cache." job=mfm-monitor-haproxy pid=32983
>> time="2018-02-02T15:45:06+03:00" level=info msg="[WARNING] 032/154506 (32983) : tsdb_backend_query/tsdb_query1 changed its IP from 10.182.161..239 to 10.182.161.152 by DNS cache." job=mfm-monitor-haproxy pid=32983
>> time="2018-02-02T15:45:10+03:00" level=info msg="[WARNING] 032/154510 (32983) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182.161..92 to 10.182.161.239 by DNS cache." job=mfm-monitor-haproxy pid=32983
>> time="2018-02-02T15:45:18+03:00" level=info msg="[WARNING] 032/154518 (32983) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182.161..239 to 10.182.161.92 by DNS cache." job=mfm-monitor-haproxy pid=32983
>> time="2018-02-02T15:45:20+03:00" level=info msg="[WARNING] 032/154520 (32983) : tsdb_backend_query/tsdb_query2 changed its IP from 10.182.161..240 to 10.182.161.239 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>
>> This isn’t really break the service, but I think this is not quite normal.
>>
>> Any advise on how to resolve this issue?
>>
>> ​
--
Mike Chepaykin
Baptiste
Re: Server-template and randomized DNS responses
February 11, 2018 04:50PM
Hi,

What consul version are you using?
I'm facing the same issue in my consul lab. That said, it seems to be a bug
in consul, not able to serve too many SRV records over UDP.
I even triggered a consul crash (using 1.0.5 version).
I'm still investigating this issue and will come back to you as soon as I
have more reliable information.

Note: please ensure the number of server created by server-template
directive (5 in your case) is above the expected number of server available
in your service.

Baptiste



On Thu, Feb 8, 2018 at 12:32 AM, Чепайкин Михаил <[email protected]>
wrote:

> Hi
>
> I’ve changed configuration as you suggested:
>
> backend tsdb_backend_query
> server-template tsdb_query 5 _mfm-monitor-opentsdb._tcp.service.mfmconsul:4242 check resolvers dns inter 1000
>
> Logs are kinda different - backend servers now go UP and DOWN, but seems
> the same - ip addresses changing in the same way:
>
> time="2018-02-08T02:12:53+03:00" level=info msg="[WARNING] 038/021253 (18208) : Server tsdb_backend_query/tsdb_query1 is going DOWN for maintenance (No IP for server ). 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue." job=mfm-monitor-haproxy pid=18208
> time="2018-02-08T02:12:53+03:00" level=info msg="[WARNING] 038/021253 (18208) : tsdb_backend_query/tsdb_query1 changed its IP from 10.182.161.223 to 10.182.161.211 by DNS cache." job=mfm-monitor-haproxy pid=18208
> time="2018-02-08T02:12:53+03:00" level=info msg="[WARNING] 038/021253 (18208) : Server tsdb_backend_query/tsdb_query1 administratively READY thanks to valid DNS answer." job=mfm-monitor-haproxy pid=18208
> time="2018-02-08T02:12:53+03:00" level=info msg="[WARNING] 038/021253 (18208) : Server tsdb_backend_query/tsdb_query1 ('0ab6a1d3.addr.dc1.mfmconsul') is UP/READY (resolves again)." job=mfm-monitor-haproxy pid=18208
> time="2018-02-08T02:12:55+03:00" level=info msg="[WARNING] 038/021255 (18208) : Server tsdb_backend_query/tsdb_query3 is going DOWN for maintenance (No IP for server ). 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue." job=mfm-monitor-haproxy pid=18208
> time="2018-02-08T02:12:55+03:00" level=info msg="[WARNING] 038/021255 (18208) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182.161.98 to 10.182.161.223 by DNS cache." job=mfm-monitor-haproxy pid=18208
> time="2018-02-08T02:12:55+03:00" level=info msg="[WARNING] 038/021255 (18208) : Server tsdb_backend_query/tsdb_query3 administratively READY thanks to valid DNS answer." job=mfm-monitor-haproxy pid=18208
> time="2018-02-08T02:12:55+03:00" level=info msg="[WARNING] 038/021255 (18208) : Server tsdb_backend_query/tsdb_query3 ('0ab6a1df.addr.dc1.mfmconsul') is UP/READY (resolves again)." job=mfm-monitor-haproxy pid=18208
> time="2018-02-08T02:12:57+03:00" level=info msg="[WARNING] 038/021257 (18208) : Server tsdb_backend_query/tsdb_query3 is going DOWN for maintenance (No IP for server ). 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue." job=mfm-monitor-haproxy pid=18208
> time="2018-02-08T02:12:57+03:00" level=info msg="[WARNING] 038/021257 (18208) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182.161.223 to 10.182.161.98 by DNS cache." job=mfm-monitor-haproxy pid=18208
> time="2018-02-08T02:12:57+03:00" level=info msg="[WARNING] 038/021257 (18208) : Server tsdb_backend_query/tsdb_query3 administratively READY thanks to valid DNS answer." job=mfm-monitor-haproxy pid=18208
> time="2018-02-08T02:12:57+03:00" level=info msg="[WARNING] 038/021257 (18208) : Server tsdb_backend_query/tsdb_query3 ('0ab6a162.addr.dc1.mfmconsul') is UP/READY (resolves again)." job=mfm-monitor-haproxy pid=18208
> time="2018-02-08T02:13:01+03:00" level=info msg="[WARNING] 038/021301 (18208) : Server tsdb_backend_query/tsdb_query1 is going DOWN for maintenance (No IP for server ). 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue." job=mfm-monitor-haproxy pid=18208
> time="2018-02-08T02:13:01+03:00" level=info msg="[WARNING] 038/021301 (18208) : tsdb_backend_query/tsdb_query1 changed its IP from 10.182.161.211 to 10.182.161.223 by DNS cache." job=mfm-monitor-haproxy pid=18208
> time="2018-02-08T02:13:01+03:00" level=info msg="[WARNING] 038/021301 (18208) : Server tsdb_backend_query/tsdb_query1 administratively READY thanks to valid DNS answer." job=mfm-monitor-haproxy pid=18208
> time="2018-02-08T02:13:01+03:00" level=info msg="[WARNING] 038/021301 (18208) : Server tsdb_backend_query/tsdb_query1 ('0ab6a1df.addr.dc1.mfmconsul') is UP/READY (resolves again)." job=mfm-monitor-haproxy pid=18208
> time="2018-02-08T02:13:05+03:00" level=info msg="[WARNING] 038/021305 (18208) : Server tsdb_backend_query/tsdb_query2 is going DOWN for maintenance (No IP for server ). 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue." job=mfm-monitor-haproxy pid=18208
> time="2018-02-08T02:13:05+03:00" level=info msg="[WARNING] 038/021305 (18208) : tsdb_backend_query/tsdb_query2 changed its IP from 10.182.161.163 to 10.182.161.211 by DNS cache." job=mfm-monitor-haproxy pid=18208
>
> Any thoughts?
>
> On 8 February 2018 at 01:25, Baptiste <[email protected]> wrote:
>
> Hi
>>
>> You're not using SRV records and that may be the root cause of your issue.
>> Please try something like this:
>>
>> backend tsdb_backend_query
>> server-template tsdb_query 5 _mfm-monitor-opentsdb._tcp.service.mfmconsul:4242 check resolvers dns inter 1000
>>
>> if "mfm-monitor-opentsdb" is your service name in consul.
>>
>> Baptiste
>>
>>
>>
>> On Wed, Feb 7, 2018 at 2:52 PM, Чепайкин Михаил <[email protected]>
>> wrote:
>>
>>> Hi!
>>>
>>> I have a Consul as service discovery tool and HAProxy as load balancer.
>>>
>>> In Consul registered a service running on a number of servers, and this
>>> service can be scaled by adding and removing nodes and by moving nodes from
>>> one server to another.
>>>
>>> Consul has DNS service which randomizes responses for services like that:
>>>
>>> [bux] [email protected]:~$ dig +short mfm-monitor-opentsdb.service.mfmconsul
>>> 10.182.161.239
>>> 10.182.161.152
>>> 10.182.161.240
>>> 10.182.161.92
>>> [bux] [email protected]:~$ dig +short mfm-monitor-opentsdb.service.mfmconsul
>>> 10.182.161.92
>>> 10.182.161.152
>>> 10.182.161.240
>>> 10.182.161.239
>>>
>>> In HAProxy 1.8.3 im using server-template configuration, like that:
>>>
>>> resolvers dns
>>> nameserver dns1 ${HAPROXY_NAMESERVER}
>>> hold valid 2s
>>>
>>> backend tsdb_backend_query
>>> server-template tsdb_query 5 mfm-monitor-opentsdb.service.mfmconsul:4242 check resolvers dns inter 1000
>>>
>>> And in that case I get alot of warinings in haproxy log:
>>>
>>> time="2018-02-02T15:44:32+03:00" level=info msg="[WARNING] 032/154432 (32983) : tsdb_backend_query/tsdb_query1 changed its IP from 10.182.161.240 to 10.182.161.239 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>> time="2018-02-02T15:44:42+03:00" level=info msg="[WARNING] 032/154442 (32983) : tsdb_backend_query/tsdb_query1 changed its IP from 10.182.161.239 to 10.182.161.240 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>> time="2018-02-02T15:44:46+03:00" level=info msg="[WARNING] 032/154446 (32983) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182.161.152 to 10.182.161.239 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>> time="2018-02-02T15:44:50+03:00" level=info msg="[WARNING] 032/154450 (32983) : tsdb_backend_query/tsdb_query2 changed its IP from 10.182.161.92 to 10.182.161.152 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>> time="2018-02-02T15:44:52+03:00" level=info msg="[WARNING] 032/154452 (32983) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182.161.239 to 10.182.161.92 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>> time="2018-02-02T15:44:56+03:00" level=info msg="[WARNING] 032/154456 (32983) : tsdb_backend_query/tsdb_query1 changed its IP from 10.182.161.240 to 10.182.161.239 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>> time="2018-02-02T15:45:00+03:00" level=info msg="[WARNING] 032/154500 (32983) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182.161.92 to 10.182.161.240 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>> time="2018-02-02T15:45:02+03:00" level=info msg="[WARNING] 032/154502 (32983) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182.161.240 to 10.182.161.92 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>> time="2018-02-02T15:45:04+03:00" level=info msg="[WARNING] 032/154504 (32983) : tsdb_backend_query/tsdb_query2 changed its IP from 10.182.161.152 to 10.182.161.240 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>> time="2018-02-02T15:45:06+03:00" level=info msg="[WARNING] 032/154506 (32983) : tsdb_backend_query/tsdb_query1 changed its IP from 10.182.161.239 to 10.182.161.152 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>> time="2018-02-02T15:45:10+03:00" level=info msg="[WARNING] 032/154510 (32983) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182.161.92 to 10.182.161.239 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>> time="2018-02-02T15:45:18+03:00" level=info msg="[WARNING] 032/154518 (32983) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182.161.239 to 10.182.161.92 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>> time="2018-02-02T15:45:20+03:00" level=info msg="[WARNING] 032/154520 (32983) : tsdb_backend_query/tsdb_query2 changed its IP from 10.182.161.240 to 10.182.161.239 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>
>>> This isn’t really break the service, but I think this is not quite
>>> normal.
>>>
>>> Any advise on how to resolve this issue?
>>>
>>> ​
> --
> Mike Chepaykin
>
>
Чепайкин Михаил
Re: Server-template and randomized DNS responses
February 12, 2018 08:40AM
Im on Consul 1.0.2.

Why do you think this issue is about serving SRV over UDP, rather than
about different order of SRV or A records returned by Consul DNS with
consecutive requests?

On 11 February 2018 at 18:46, Baptiste <[email protected]> wrote:

> Hi,
>
> What consul version are you using?
> I'm facing the same issue in my consul lab. That said, it seems to be a
> bug in consul, not able to serve too many SRV records over UDP.
> I even triggered a consul crash (using 1.0.5 version).
> I'm still investigating this issue and will come back to you as soon as I
> have more reliable information.
>
> Note: please ensure the number of server created by server-template
> directive (5 in your case) is above the expected number of server available
> in your service.
>
> Baptiste
>
>
>
> On Thu, Feb 8, 2018 at 12:32 AM, Чепайкин Михаил <[email protected]>
> wrote:
>
>> Hi
>>
>> I’ve changed configuration as you suggested:
>>
>> backend tsdb_backend_query
>> server-template tsdb_query 5 _mfm-monitor-opentsdb._tcp.service.mfmconsul:4242 check resolvers dns inter 1000
>>
>> Logs are kinda different - backend servers now go UP and DOWN, but seems
>> the same - ip addresses changing in the same way:
>>
>> time="2018-02-08T02:12:53+03:00" level=info msg="[WARNING] 038/021253 (18208) : Server tsdb_backend_query/tsdb_query1 is going DOWN for maintenance (No IP for server ). 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue." job=mfm-monitor-haproxy pid=18208
>> time="2018-02-08T02:12:53+03:00" level=info msg="[WARNING] 038/021253 (18208) : tsdb_backend_query/tsdb_query1 changed its IP from 10.182.161..223 to 10.182.161.211 by DNS cache." job=mfm-monitor-haproxy pid=18208
>> time="2018-02-08T02:12:53+03:00" level=info msg="[WARNING] 038/021253 (18208) : Server tsdb_backend_query/tsdb_query1 administratively READY thanks to valid DNS answer." job=mfm-monitor-haproxy pid=18208
>> time="2018-02-08T02:12:53+03:00" level=info msg="[WARNING] 038/021253 (18208) : Server tsdb_backend_query/tsdb_query1 ('0ab6a1d3.addr.dc1.mfmconsul') is UP/READY (resolves again)." job=mfm-monitor-haproxy pid=18208
>> time="2018-02-08T02:12:55+03:00" level=info msg="[WARNING] 038/021255 (18208) : Server tsdb_backend_query/tsdb_query3 is going DOWN for maintenance (No IP for server ). 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue." job=mfm-monitor-haproxy pid=18208
>> time="2018-02-08T02:12:55+03:00" level=info msg="[WARNING] 038/021255 (18208) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182.161..98 to 10.182.161.223 by DNS cache." job=mfm-monitor-haproxy pid=18208
>> time="2018-02-08T02:12:55+03:00" level=info msg="[WARNING] 038/021255 (18208) : Server tsdb_backend_query/tsdb_query3 administratively READY thanks to valid DNS answer." job=mfm-monitor-haproxy pid=18208
>> time="2018-02-08T02:12:55+03:00" level=info msg="[WARNING] 038/021255 (18208) : Server tsdb_backend_query/tsdb_query3 ('0ab6a1df.addr.dc1.mfmconsul') is UP/READY (resolves again)." job=mfm-monitor-haproxy pid=18208
>> time="2018-02-08T02:12:57+03:00" level=info msg="[WARNING] 038/021257 (18208) : Server tsdb_backend_query/tsdb_query3 is going DOWN for maintenance (No IP for server ). 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue." job=mfm-monitor-haproxy pid=18208
>> time="2018-02-08T02:12:57+03:00" level=info msg="[WARNING] 038/021257 (18208) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182.161..223 to 10.182.161.98 by DNS cache." job=mfm-monitor-haproxy pid=18208
>> time="2018-02-08T02:12:57+03:00" level=info msg="[WARNING] 038/021257 (18208) : Server tsdb_backend_query/tsdb_query3 administratively READY thanks to valid DNS answer." job=mfm-monitor-haproxy pid=18208
>> time="2018-02-08T02:12:57+03:00" level=info msg="[WARNING] 038/021257 (18208) : Server tsdb_backend_query/tsdb_query3 ('0ab6a162.addr.dc1.mfmconsul') is UP/READY (resolves again)." job=mfm-monitor-haproxy pid=18208
>> time="2018-02-08T02:13:01+03:00" level=info msg="[WARNING] 038/021301 (18208) : Server tsdb_backend_query/tsdb_query1 is going DOWN for maintenance (No IP for server ). 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue." job=mfm-monitor-haproxy pid=18208
>> time="2018-02-08T02:13:01+03:00" level=info msg="[WARNING] 038/021301 (18208) : tsdb_backend_query/tsdb_query1 changed its IP from 10.182.161..211 to 10.182.161.223 by DNS cache." job=mfm-monitor-haproxy pid=18208
>> time="2018-02-08T02:13:01+03:00" level=info msg="[WARNING] 038/021301 (18208) : Server tsdb_backend_query/tsdb_query1 administratively READY thanks to valid DNS answer." job=mfm-monitor-haproxy pid=18208
>> time="2018-02-08T02:13:01+03:00" level=info msg="[WARNING] 038/021301 (18208) : Server tsdb_backend_query/tsdb_query1 ('0ab6a1df.addr.dc1.mfmconsul') is UP/READY (resolves again)." job=mfm-monitor-haproxy pid=18208
>> time="2018-02-08T02:13:05+03:00" level=info msg="[WARNING] 038/021305 (18208) : Server tsdb_backend_query/tsdb_query2 is going DOWN for maintenance (No IP for server ). 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue." job=mfm-monitor-haproxy pid=18208
>> time="2018-02-08T02:13:05+03:00" level=info msg="[WARNING] 038/021305 (18208) : tsdb_backend_query/tsdb_query2 changed its IP from 10.182.161..163 to 10.182.161.211 by DNS cache." job=mfm-monitor-haproxy pid=18208
>>
>> Any thoughts?
>>
>> On 8 February 2018 at 01:25, Baptiste <[email protected]> wrote:
>>
>> Hi
>>>
>>> You're not using SRV records and that may be the root cause of your
>>> issue.
>>> Please try something like this:
>>>
>>> backend tsdb_backend_query
>>> server-template tsdb_query 5 _mfm-monitor-opentsdb._tcp.service.mfmconsul:4242 check resolvers dns inter 1000
>>>
>>> if "mfm-monitor-opentsdb" is your service name in consul.
>>>
>>> Baptiste
>>>
>>>
>>>
>>> On Wed, Feb 7, 2018 at 2:52 PM, Чепайкин Михаил <[email protected]>
>>> wrote:
>>>
>>>> Hi!
>>>>
>>>> I have a Consul as service discovery tool and HAProxy as load balancer..
>>>>
>>>> In Consul registered a service running on a number of servers, and this
>>>> service can be scaled by adding and removing nodes and by moving nodes from
>>>> one server to another.
>>>>
>>>> Consul has DNS service which randomizes responses for services like
>>>> that:
>>>>
>>>> [bux] [email protected]:~$ dig +short mfm-monitor-opentsdb.service.mfmconsul
>>>> 10.182.161.239
>>>> 10.182.161.152
>>>> 10.182.161.240
>>>> 10.182.161.92
>>>> [bux] [email protected]:~$ dig +short mfm-monitor-opentsdb.service.mfmconsul
>>>> 10.182.161.92
>>>> 10.182.161.152
>>>> 10.182.161.240
>>>> 10.182.161.239
>>>>
>>>> In HAProxy 1.8.3 im using server-template configuration, like that:
>>>>
>>>> resolvers dns
>>>> nameserver dns1 ${HAPROXY_NAMESERVER}
>>>> hold valid 2s
>>>>
>>>> backend tsdb_backend_query
>>>> server-template tsdb_query 5 mfm-monitor-opentsdb.service.mfmconsul:4242 check resolvers dns inter 1000
>>>>
>>>> And in that case I get alot of warinings in haproxy log:
>>>>
>>>> time="2018-02-02T15:44:32+03:00" level=info msg="[WARNING] 032/154432 (32983) : tsdb_backend_query/tsdb_query1 changed its IP from 10.182.161.240 to 10.182.161.239 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>> time="2018-02-02T15:44:42+03:00" level=info msg="[WARNING] 032/154442 (32983) : tsdb_backend_query/tsdb_query1 changed its IP from 10.182.161.239 to 10.182.161.240 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>> time="2018-02-02T15:44:46+03:00" level=info msg="[WARNING] 032/154446 (32983) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182.161.152 to 10.182.161.239 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>> time="2018-02-02T15:44:50+03:00" level=info msg="[WARNING] 032/154450 (32983) : tsdb_backend_query/tsdb_query2 changed its IP from 10.182.161.92 to 10.182.161.152 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>> time="2018-02-02T15:44:52+03:00" level=info msg="[WARNING] 032/154452 (32983) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182.161.239 to 10.182.161.92 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>> time="2018-02-02T15:44:56+03:00" level=info msg="[WARNING] 032/154456 (32983) : tsdb_backend_query/tsdb_query1 changed its IP from 10.182.161.240 to 10.182.161.239 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>> time="2018-02-02T15:45:00+03:00" level=info msg="[WARNING] 032/154500 (32983) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182.161.92 to 10.182.161.240 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>> time="2018-02-02T15:45:02+03:00" level=info msg="[WARNING] 032/154502 (32983) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182.161.240 to 10.182.161.92 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>> time="2018-02-02T15:45:04+03:00" level=info msg="[WARNING] 032/154504 (32983) : tsdb_backend_query/tsdb_query2 changed its IP from 10.182.161.152 to 10.182.161.240 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>> time="2018-02-02T15:45:06+03:00" level=info msg="[WARNING] 032/154506 (32983) : tsdb_backend_query/tsdb_query1 changed its IP from 10.182.161.239 to 10.182.161.152 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>> time="2018-02-02T15:45:10+03:00" level=info msg="[WARNING] 032/154510 (32983) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182.161.92 to 10.182.161.239 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>> time="2018-02-02T15:45:18+03:00" level=info msg="[WARNING] 032/154518 (32983) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182.161.239 to 10.182.161.92 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>> time="2018-02-02T15:45:20+03:00" level=info msg="[WARNING] 032/154520 (32983) : tsdb_backend_query/tsdb_query2 changed its IP from 10.182.161.240 to 10.182.161.239 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>>
>>>> This isn’t really break the service, but I think this is not quite
>>>> normal.
>>>>
>>>> Any advise on how to resolve this issue?
>>>>
>>>
--
Mike Chepaykin
Baptiste
Re: Server-template and randomized DNS responses
February 12, 2018 09:30AM
First, I confirm the following bug in consul 1.0.5:
- start a X instances of a service
- scale the service to X+Y (with Y > 1)
==> then consul crashes...
From time to time, I also saw HAProxy getting only 10 servers from 20 for a
given service.

I'll revert to 1.0.2 for now.

The order of the returned SRV records is ignored by HAProxy.
Can you confirm the number of servers associated to the service '
mfm-monitor-opentsdb' in consul?
On the HAProxy box, can you run the following command and return the output
(obfuscating the IPs and other sensible information)
dig +notcp @127.0.0.1 -p 8600 -t SRV _mfm-monitor-opentsdb
.._tcp.service.consul

Baptiste



On Mon, Feb 12, 2018 at 8:27 AM, Чепайкин Михаил <[email protected]>
wrote:

> Im on Consul 1.0.2.
>
> Why do you think this issue is about serving SRV over UDP, rather than
> about different order of SRV or A records returned by Consul DNS with
> consecutive requests?
>
> On 11 February 2018 at 18:46, Baptiste <[email protected]> wrote:
>
>> Hi,
>>
>> What consul version are you using?
>> I'm facing the same issue in my consul lab. That said, it seems to be a
>> bug in consul, not able to serve too many SRV records over UDP.
>> I even triggered a consul crash (using 1.0.5 version).
>> I'm still investigating this issue and will come back to you as soon as I
>> have more reliable information.
>>
>> Note: please ensure the number of server created by server-template
>> directive (5 in your case) is above the expected number of server available
>> in your service.
>>
>> Baptiste
>>
>>
>>
>> On Thu, Feb 8, 2018 at 12:32 AM, Чепайкин Михаил <[email protected]>
>> wrote:
>>
>>> Hi
>>>
>>> I’ve changed configuration as you suggested:
>>>
>>> backend tsdb_backend_query
>>> server-template tsdb_query 5 _mfm-monitor-opentsdb._tcp.service.mfmconsul:4242 check resolvers dns inter 1000
>>>
>>> Logs are kinda different - backend servers now go UP and DOWN, but seems
>>> the same - ip addresses changing in the same way:
>>>
>>> time="2018-02-08T02:12:53+03:00" level=info msg="[WARNING] 038/021253 (18208) : Server tsdb_backend_query/tsdb_query1 is going DOWN for maintenance (No IP for server ). 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue." job=mfm-monitor-haproxy pid=18208
>>> time="2018-02-08T02:12:53+03:00" level=info msg="[WARNING] 038/021253 (18208) : tsdb_backend_query/tsdb_query1 changed its IP from 10.182.161.223 to 10.182.161.211 by DNS cache." job=mfm-monitor-haproxy pid=18208
>>> time="2018-02-08T02:12:53+03:00" level=info msg="[WARNING] 038/021253 (18208) : Server tsdb_backend_query/tsdb_query1 administratively READY thanks to valid DNS answer." job=mfm-monitor-haproxy pid=18208
>>> time="2018-02-08T02:12:53+03:00" level=info msg="[WARNING] 038/021253 (18208) : Server tsdb_backend_query/tsdb_query1 ('0ab6a1d3.addr.dc1.mfmconsul') is UP/READY (resolves again)." job=mfm-monitor-haproxy pid=18208
>>> time="2018-02-08T02:12:55+03:00" level=info msg="[WARNING] 038/021255 (18208) : Server tsdb_backend_query/tsdb_query3 is going DOWN for maintenance (No IP for server ). 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue." job=mfm-monitor-haproxy pid=18208
>>> time="2018-02-08T02:12:55+03:00" level=info msg="[WARNING] 038/021255 (18208) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182.161.98 to 10.182.161.223 by DNS cache." job=mfm-monitor-haproxy pid=18208
>>> time="2018-02-08T02:12:55+03:00" level=info msg="[WARNING] 038/021255 (18208) : Server tsdb_backend_query/tsdb_query3 administratively READY thanks to valid DNS answer." job=mfm-monitor-haproxy pid=18208
>>> time="2018-02-08T02:12:55+03:00" level=info msg="[WARNING] 038/021255 (18208) : Server tsdb_backend_query/tsdb_query3 ('0ab6a1df.addr.dc1.mfmconsul') is UP/READY (resolves again)." job=mfm-monitor-haproxy pid=18208
>>> time="2018-02-08T02:12:57+03:00" level=info msg="[WARNING] 038/021257 (18208) : Server tsdb_backend_query/tsdb_query3 is going DOWN for maintenance (No IP for server ). 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue." job=mfm-monitor-haproxy pid=18208
>>> time="2018-02-08T02:12:57+03:00" level=info msg="[WARNING] 038/021257 (18208) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182.161.223 to 10.182.161.98 by DNS cache." job=mfm-monitor-haproxy pid=18208
>>> time="2018-02-08T02:12:57+03:00" level=info msg="[WARNING] 038/021257 (18208) : Server tsdb_backend_query/tsdb_query3 administratively READY thanks to valid DNS answer." job=mfm-monitor-haproxy pid=18208
>>> time="2018-02-08T02:12:57+03:00" level=info msg="[WARNING] 038/021257 (18208) : Server tsdb_backend_query/tsdb_query3 ('0ab6a162.addr.dc1.mfmconsul') is UP/READY (resolves again)." job=mfm-monitor-haproxy pid=18208
>>> time="2018-02-08T02:13:01+03:00" level=info msg="[WARNING] 038/021301 (18208) : Server tsdb_backend_query/tsdb_query1 is going DOWN for maintenance (No IP for server ). 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue." job=mfm-monitor-haproxy pid=18208
>>> time="2018-02-08T02:13:01+03:00" level=info msg="[WARNING] 038/021301 (18208) : tsdb_backend_query/tsdb_query1 changed its IP from 10.182.161.211 to 10.182.161.223 by DNS cache." job=mfm-monitor-haproxy pid=18208
>>> time="2018-02-08T02:13:01+03:00" level=info msg="[WARNING] 038/021301 (18208) : Server tsdb_backend_query/tsdb_query1 administratively READY thanks to valid DNS answer." job=mfm-monitor-haproxy pid=18208
>>> time="2018-02-08T02:13:01+03:00" level=info msg="[WARNING] 038/021301 (18208) : Server tsdb_backend_query/tsdb_query1 ('0ab6a1df.addr.dc1.mfmconsul') is UP/READY (resolves again)." job=mfm-monitor-haproxy pid=18208
>>> time="2018-02-08T02:13:05+03:00" level=info msg="[WARNING] 038/021305 (18208) : Server tsdb_backend_query/tsdb_query2 is going DOWN for maintenance (No IP for server ). 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue." job=mfm-monitor-haproxy pid=18208
>>> time="2018-02-08T02:13:05+03:00" level=info msg="[WARNING] 038/021305 (18208) : tsdb_backend_query/tsdb_query2 changed its IP from 10.182.161.163 to 10.182.161.211 by DNS cache." job=mfm-monitor-haproxy pid=18208
>>>
>>> Any thoughts?
>>>
>>> On 8 February 2018 at 01:25, Baptiste <[email protected]> wrote:
>>>
>>> Hi
>>>>
>>>> You're not using SRV records and that may be the root cause of your
>>>> issue.
>>>> Please try something like this:
>>>>
>>>> backend tsdb_backend_query
>>>> server-template tsdb_query 5 _mfm-monitor-opentsdb._tcp.service.mfmconsul:4242 check resolvers dns inter 1000
>>>>
>>>> if "mfm-monitor-opentsdb" is your service name in consul.
>>>>
>>>> Baptiste
>>>>
>>>>
>>>>
>>>> On Wed, Feb 7, 2018 at 2:52 PM, Чепайкин Михаил <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi!
>>>>>
>>>>> I have a Consul as service discovery tool and HAProxy as load balancer.
>>>>>
>>>>> In Consul registered a service running on a number of servers, and
>>>>> this service can be scaled by adding and removing nodes and by moving nodes
>>>>> from one server to another.
>>>>>
>>>>> Consul has DNS service which randomizes responses for services like
>>>>> that:
>>>>>
>>>>> [bux] [email protected]:~$ dig +short mfm-monitor-opentsdb.service.mfmconsul
>>>>> 10.182.161.239
>>>>> 10.182.161.152
>>>>> 10.182.161.240
>>>>> 10.182.161.92
>>>>> [bux] [email protected]:~$ dig +short mfm-monitor-opentsdb.service.mfmconsul
>>>>> 10.182.161.92
>>>>> 10.182.161.152
>>>>> 10.182.161.240
>>>>> 10.182.161.239
>>>>>
>>>>> In HAProxy 1.8.3 im using server-template configuration, like that:
>>>>>
>>>>> resolvers dns
>>>>> nameserver dns1 ${HAPROXY_NAMESERVER}
>>>>> hold valid 2s
>>>>>
>>>>> backend tsdb_backend_query
>>>>> server-template tsdb_query 5 mfm-monitor-opentsdb.service.mfmconsul:4242 check resolvers dns inter 1000
>>>>>
>>>>> And in that case I get alot of warinings in haproxy log:
>>>>>
>>>>> time="2018-02-02T15:44:32+03:00" level=info msg="[WARNING] 032/154432 (32983) : tsdb_backend_query/tsdb_query1 changed its IP from 10.182.161.240 to 10.182.161.239 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>>> time="2018-02-02T15:44:42+03:00" level=info msg="[WARNING] 032/154442 (32983) : tsdb_backend_query/tsdb_query1 changed its IP from 10.182.161.239 to 10.182.161.240 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>>> time="2018-02-02T15:44:46+03:00" level=info msg="[WARNING] 032/154446 (32983) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182.161.152 to 10.182.161.239 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>>> time="2018-02-02T15:44:50+03:00" level=info msg="[WARNING] 032/154450 (32983) : tsdb_backend_query/tsdb_query2 changed its IP from 10.182.161.92 to 10.182.161.152 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>>> time="2018-02-02T15:44:52+03:00" level=info msg="[WARNING] 032/154452 (32983) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182.161.239 to 10.182.161.92 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>>> time="2018-02-02T15:44:56+03:00" level=info msg="[WARNING] 032/154456 (32983) : tsdb_backend_query/tsdb_query1 changed its IP from 10.182.161.240 to 10.182.161.239 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>>> time="2018-02-02T15:45:00+03:00" level=info msg="[WARNING] 032/154500 (32983) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182.161.92 to 10.182.161.240 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>>> time="2018-02-02T15:45:02+03:00" level=info msg="[WARNING] 032/154502 (32983) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182.161.240 to 10.182.161.92 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>>> time="2018-02-02T15:45:04+03:00" level=info msg="[WARNING] 032/154504 (32983) : tsdb_backend_query/tsdb_query2 changed its IP from 10.182.161.152 to 10.182.161.240 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>>> time="2018-02-02T15:45:06+03:00" level=info msg="[WARNING] 032/154506 (32983) : tsdb_backend_query/tsdb_query1 changed its IP from 10.182.161.239 to 10.182.161.152 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>>> time="2018-02-02T15:45:10+03:00" level=info msg="[WARNING] 032/154510 (32983) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182.161.92 to 10.182.161.239 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>>> time="2018-02-02T15:45:18+03:00" level=info msg="[WARNING] 032/154518 (32983) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182.161.239 to 10.182.161.92 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>>> time="2018-02-02T15:45:20+03:00" level=info msg="[WARNING] 032/154520 (32983) : tsdb_backend_query/tsdb_query2 changed its IP from 10.182.161.240 to 10.182.161.239 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>>>
>>>>> This isn’t really break the service, but I think this is not quite
>>>>> normal.
>>>>>
>>>>> Any advise on how to resolve this issue?
>>>>>
>>>>
> --
> Mike Chepaykin
>
>
Baptiste
Re: Server-template and randomized DNS responses
February 12, 2018 10:30AM
Continuing on my investigation I found an other interesting piece of
information:
I run haproxy and my consul environment in a docker host, through
docker-compose and I can reproduce the same issue as you.
Basically, I have a service delivered by 20 containers, and HAProxy in
docker can see only 10 of them and switches all their IPs all the time...
That said, if I run the same HAProxy binary on my laptop, pointing it's DNS
resolvers to the consul client running in my docker host, everything works
smoothly!!!

In my case, there is one thing that might happen: docker drops too big DNS
responses (UDP) and my HAProxy failover to 512 bytes only where only 10 SRV
records could stand (consul also returns A and TXT records for each SRV
response).

I tested both latest 1.8 and 1.9-dev and can report same issue in both
cases.

Could you tell me more about your environment (drop the ML if there are too
many sensitive information)

Baptiste


On Mon, Feb 12, 2018 at 9:25 AM, Baptiste <[email protected]> wrote:

> First, I confirm the following bug in consul 1.0.5:
> - start a X instances of a service
> - scale the service to X+Y (with Y > 1)
> ==> then consul crashes...
> From time to time, I also saw HAProxy getting only 10 servers from 20 for
> a given service.
>
> I'll revert to 1.0.2 for now.
>
> The order of the returned SRV records is ignored by HAProxy.
> Can you confirm the number of servers associated to the service '
> mfm-monitor-opentsdb' in consul?
> On the HAProxy box, can you run the following command and return the
> output (obfuscating the IPs and other sensible information)
> dig +notcp @127.0.0.1 -p 8600 -t SRV _mfm-monitor-opentsdb._tcp.
> service.consul
>
> Baptiste
>
>
>
> On Mon, Feb 12, 2018 at 8:27 AM, Чепайкин Михаил <[email protected]>
> wrote:
>
>> Im on Consul 1.0.2.
>>
>> Why do you think this issue is about serving SRV over UDP, rather than
>> about different order of SRV or A records returned by Consul DNS with
>> consecutive requests?
>>
>> On 11 February 2018 at 18:46, Baptiste <[email protected]> wrote:
>>
>>> Hi,
>>>
>>> What consul version are you using?
>>> I'm facing the same issue in my consul lab. That said, it seems to be a
>>> bug in consul, not able to serve too many SRV records over UDP.
>>> I even triggered a consul crash (using 1.0.5 version).
>>> I'm still investigating this issue and will come back to you as soon as
>>> I have more reliable information.
>>>
>>> Note: please ensure the number of server created by server-template
>>> directive (5 in your case) is above the expected number of server available
>>> in your service.
>>>
>>> Baptiste
>>>
>>>
>>>
>>> On Thu, Feb 8, 2018 at 12:32 AM, Чепайкин Михаил <[email protected]>
>>> wrote:
>>>
>>>> Hi
>>>>
>>>> I’ve changed configuration as you suggested:
>>>>
>>>> backend tsdb_backend_query
>>>> server-template tsdb_query 5 _mfm-monitor-opentsdb._tcp.service.mfmconsul:4242 check resolvers dns inter 1000
>>>>
>>>> Logs are kinda different - backend servers now go UP and DOWN, but
>>>> seems the same - ip addresses changing in the same way:
>>>>
>>>> time="2018-02-08T02:12:53+03:00" level=info msg="[WARNING] 038/021253 (18208) : Server tsdb_backend_query/tsdb_query1 is going DOWN for maintenance (No IP for server ). 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue." job=mfm-monitor-haproxy pid=18208
>>>> time="2018-02-08T02:12:53+03:00" level=info msg="[WARNING] 038/021253 (18208) : tsdb_backend_query/tsdb_query1 changed its IP from 10.182.161.223 to 10.182.161.211 by DNS cache." job=mfm-monitor-haproxy pid=18208
>>>> time="2018-02-08T02:12:53+03:00" level=info msg="[WARNING] 038/021253 (18208) : Server tsdb_backend_query/tsdb_query1 administratively READY thanks to valid DNS answer." job=mfm-monitor-haproxy pid=18208
>>>> time="2018-02-08T02:12:53+03:00" level=info msg="[WARNING] 038/021253 (18208) : Server tsdb_backend_query/tsdb_query1 ('0ab6a1d3.addr.dc1.mfmconsul') is UP/READY (resolves again)." job=mfm-monitor-haproxy pid=18208
>>>> time="2018-02-08T02:12:55+03:00" level=info msg="[WARNING] 038/021255 (18208) : Server tsdb_backend_query/tsdb_query3 is going DOWN for maintenance (No IP for server ). 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue." job=mfm-monitor-haproxy pid=18208
>>>> time="2018-02-08T02:12:55+03:00" level=info msg="[WARNING] 038/021255 (18208) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182.161.98 to 10.182.161.223 by DNS cache." job=mfm-monitor-haproxy pid=18208
>>>> time="2018-02-08T02:12:55+03:00" level=info msg="[WARNING] 038/021255 (18208) : Server tsdb_backend_query/tsdb_query3 administratively READY thanks to valid DNS answer." job=mfm-monitor-haproxy pid=18208
>>>> time="2018-02-08T02:12:55+03:00" level=info msg="[WARNING] 038/021255 (18208) : Server tsdb_backend_query/tsdb_query3 ('0ab6a1df.addr.dc1.mfmconsul') is UP/READY (resolves again)." job=mfm-monitor-haproxy pid=18208
>>>> time="2018-02-08T02:12:57+03:00" level=info msg="[WARNING] 038/021257 (18208) : Server tsdb_backend_query/tsdb_query3 is going DOWN for maintenance (No IP for server ). 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue." job=mfm-monitor-haproxy pid=18208
>>>> time="2018-02-08T02:12:57+03:00" level=info msg="[WARNING] 038/021257 (18208) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182.161.223 to 10.182.161.98 by DNS cache." job=mfm-monitor-haproxy pid=18208
>>>> time="2018-02-08T02:12:57+03:00" level=info msg="[WARNING] 038/021257 (18208) : Server tsdb_backend_query/tsdb_query3 administratively READY thanks to valid DNS answer." job=mfm-monitor-haproxy pid=18208
>>>> time="2018-02-08T02:12:57+03:00" level=info msg="[WARNING] 038/021257 (18208) : Server tsdb_backend_query/tsdb_query3 ('0ab6a162.addr.dc1.mfmconsul') is UP/READY (resolves again)." job=mfm-monitor-haproxy pid=18208
>>>> time="2018-02-08T02:13:01+03:00" level=info msg="[WARNING] 038/021301 (18208) : Server tsdb_backend_query/tsdb_query1 is going DOWN for maintenance (No IP for server ). 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue." job=mfm-monitor-haproxy pid=18208
>>>> time="2018-02-08T02:13:01+03:00" level=info msg="[WARNING] 038/021301 (18208) : tsdb_backend_query/tsdb_query1 changed its IP from 10.182.161.211 to 10.182.161.223 by DNS cache." job=mfm-monitor-haproxy pid=18208
>>>> time="2018-02-08T02:13:01+03:00" level=info msg="[WARNING] 038/021301 (18208) : Server tsdb_backend_query/tsdb_query1 administratively READY thanks to valid DNS answer." job=mfm-monitor-haproxy pid=18208
>>>> time="2018-02-08T02:13:01+03:00" level=info msg="[WARNING] 038/021301 (18208) : Server tsdb_backend_query/tsdb_query1 ('0ab6a1df.addr.dc1.mfmconsul') is UP/READY (resolves again)." job=mfm-monitor-haproxy pid=18208
>>>> time="2018-02-08T02:13:05+03:00" level=info msg="[WARNING] 038/021305 (18208) : Server tsdb_backend_query/tsdb_query2 is going DOWN for maintenance (No IP for server ). 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue." job=mfm-monitor-haproxy pid=18208
>>>> time="2018-02-08T02:13:05+03:00" level=info msg="[WARNING] 038/021305 (18208) : tsdb_backend_query/tsdb_query2 changed its IP from 10.182.161.163 to 10.182.161.211 by DNS cache." job=mfm-monitor-haproxy pid=18208
>>>>
>>>> Any thoughts?
>>>>
>>>> On 8 February 2018 at 01:25, Baptiste <[email protected]> wrote:
>>>>
>>>> Hi
>>>>>
>>>>> You're not using SRV records and that may be the root cause of your
>>>>> issue.
>>>>> Please try something like this:
>>>>>
>>>>> backend tsdb_backend_query
>>>>> server-template tsdb_query 5 _mfm-monitor-opentsdb._tcp.service.mfmconsul:4242 check resolvers dns inter 1000
>>>>>
>>>>> if "mfm-monitor-opentsdb" is your service name in consul.
>>>>>
>>>>> Baptiste
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Feb 7, 2018 at 2:52 PM, Чепайкин Михаил <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi!
>>>>>>
>>>>>> I have a Consul as service discovery tool and HAProxy as load
>>>>>> balancer.
>>>>>>
>>>>>> In Consul registered a service running on a number of servers, and
>>>>>> this service can be scaled by adding and removing nodes and by moving nodes
>>>>>> from one server to another.
>>>>>>
>>>>>> Consul has DNS service which randomizes responses for services like
>>>>>> that:
>>>>>>
>>>>>> [bux] [email protected]:~$ dig +short mfm-monitor-opentsdb.service.mfmconsul
>>>>>> 10.182.161.239
>>>>>> 10.182.161.152
>>>>>> 10.182.161.240
>>>>>> 10.182.161.92
>>>>>> [bux] [email protected]:~$ dig +short mfm-monitor-opentsdb.service.mfmconsul
>>>>>> 10.182.161.92
>>>>>> 10.182.161.152
>>>>>> 10.182.161.240
>>>>>> 10.182.161.239
>>>>>>
>>>>>> In HAProxy 1.8.3 im using server-template configuration, like that:
>>>>>>
>>>>>> resolvers dns
>>>>>> nameserver dns1 ${HAPROXY_NAMESERVER}
>>>>>> hold valid 2s
>>>>>>
>>>>>> backend tsdb_backend_query
>>>>>> server-template tsdb_query 5 mfm-monitor-opentsdb.service.mfmconsul:4242 check resolvers dns inter 1000
>>>>>>
>>>>>> And in that case I get alot of warinings in haproxy log:
>>>>>>
>>>>>> time="2018-02-02T15:44:32+03:00" level=info msg="[WARNING] 032/154432 (32983) : tsdb_backend_query/tsdb_query1 changed its IP from 10.182..161.240 to 10.182.161.239 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>>>> time="2018-02-02T15:44:42+03:00" level=info msg="[WARNING] 032/154442 (32983) : tsdb_backend_query/tsdb_query1 changed its IP from 10.182..161.239 to 10.182.161.240 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>>>> time="2018-02-02T15:44:46+03:00" level=info msg="[WARNING] 032/154446 (32983) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182..161.152 to 10.182.161.239 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>>>> time="2018-02-02T15:44:50+03:00" level=info msg="[WARNING] 032/154450 (32983) : tsdb_backend_query/tsdb_query2 changed its IP from 10.182..161.92 to 10.182.161.152 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>>>> time="2018-02-02T15:44:52+03:00" level=info msg="[WARNING] 032/154452 (32983) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182..161.239 to 10.182.161.92 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>>>> time="2018-02-02T15:44:56+03:00" level=info msg="[WARNING] 032/154456 (32983) : tsdb_backend_query/tsdb_query1 changed its IP from 10.182..161.240 to 10.182.161.239 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>>>> time="2018-02-02T15:45:00+03:00" level=info msg="[WARNING] 032/154500 (32983) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182..161.92 to 10.182.161.240 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>>>> time="2018-02-02T15:45:02+03:00" level=info msg="[WARNING] 032/154502 (32983) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182..161.240 to 10.182.161.92 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>>>> time="2018-02-02T15:45:04+03:00" level=info msg="[WARNING] 032/154504 (32983) : tsdb_backend_query/tsdb_query2 changed its IP from 10.182..161.152 to 10.182.161.240 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>>>> time="2018-02-02T15:45:06+03:00" level=info msg="[WARNING] 032/154506 (32983) : tsdb_backend_query/tsdb_query1 changed its IP from 10.182..161.239 to 10.182.161.152 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>>>> time="2018-02-02T15:45:10+03:00" level=info msg="[WARNING] 032/154510 (32983) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182..161.92 to 10.182.161.239 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>>>> time="2018-02-02T15:45:18+03:00" level=info msg="[WARNING] 032/154518 (32983) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182..161.239 to 10.182.161.92 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>>>> time="2018-02-02T15:45:20+03:00" level=info msg="[WARNING] 032/154520 (32983) : tsdb_backend_query/tsdb_query2 changed its IP from 10.182..161.240 to 10.182.161.239 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>>>>
>>>>>> This isn’t really break the service, but I think this is not quite
>>>>>> normal.
>>>>>>
>>>>>> Any advise on how to resolve this issue?
>>>>>>
>>>>>
>> --
>> Mike Chepaykin
>>
>>
>
Baptiste
Re: Server-template and randomized DNS responses
February 12, 2018 10:50AM
Replying to myself :)

I think I spotted a bug in HAProxy as well.
For some reasons, when I run HAProxy in debug more, I never ever have the
issue (all my servers are properly populated and maintained).

I did a strace of the process running in daemon mode in the container, and
I can confirm the following behavior:
- first request sent honoring the edns extension (to get big responses), up
to 8K
- first response comes back with the fill list (around 2K of data)
- second request is sent with default accepted payload size (around 1K)
- second response comes back with partial records
- mess up is starting

Now I can reproduce the bug, I'm going to investigate what's happening and
provide a fix asap.

Thanks a gain Mike for reporting!!!

Baptiste



On Mon, Feb 12, 2018 at 10:17 AM, Baptiste <[email protected]> wrote:

> Continuing on my investigation I found an other interesting piece of
> information:
> I run haproxy and my consul environment in a docker host, through
> docker-compose and I can reproduce the same issue as you.
> Basically, I have a service delivered by 20 containers, and HAProxy in
> docker can see only 10 of them and switches all their IPs all the time...
> That said, if I run the same HAProxy binary on my laptop, pointing it's
> DNS resolvers to the consul client running in my docker host, everything
> works smoothly!!!
>
> In my case, there is one thing that might happen: docker drops too big DNS
> responses (UDP) and my HAProxy failover to 512 bytes only where only 10 SRV
> records could stand (consul also returns A and TXT records for each SRV
> response).
>
> I tested both latest 1.8 and 1.9-dev and can report same issue in both
> cases.
>
> Could you tell me more about your environment (drop the ML if there are
> too many sensitive information)
>
> Baptiste
>
>
> On Mon, Feb 12, 2018 at 9:25 AM, Baptiste <[email protected]> wrote:
>
>> First, I confirm the following bug in consul 1.0.5:
>> - start a X instances of a service
>> - scale the service to X+Y (with Y > 1)
>> ==> then consul crashes...
>> From time to time, I also saw HAProxy getting only 10 servers from 20 for
>> a given service.
>>
>> I'll revert to 1.0.2 for now.
>>
>> The order of the returned SRV records is ignored by HAProxy.
>> Can you confirm the number of servers associated to the service '
>> mfm-monitor-opentsdb' in consul?
>> On the HAProxy box, can you run the following command and return the
>> output (obfuscating the IPs and other sensible information)
>> dig +notcp @127.0.0.1 -p 8600 -t SRV _mfm-monitor-opentsdb._tcp.ser
>> vice.consul
>>
>> Baptiste
>>
>>
>>
>> On Mon, Feb 12, 2018 at 8:27 AM, Чепайкин Михаил <[email protected]>
>> wrote:
>>
>>> Im on Consul 1.0.2.
>>>
>>> Why do you think this issue is about serving SRV over UDP, rather than
>>> about different order of SRV or A records returned by Consul DNS with
>>> consecutive requests?
>>>
>>> On 11 February 2018 at 18:46, Baptiste <[email protected]> wrote:
>>>
>>>> Hi,
>>>>
>>>> What consul version are you using?
>>>> I'm facing the same issue in my consul lab. That said, it seems to be a
>>>> bug in consul, not able to serve too many SRV records over UDP.
>>>> I even triggered a consul crash (using 1.0.5 version).
>>>> I'm still investigating this issue and will come back to you as soon as
>>>> I have more reliable information.
>>>>
>>>> Note: please ensure the number of server created by server-template
>>>> directive (5 in your case) is above the expected number of server available
>>>> in your service.
>>>>
>>>> Baptiste
>>>>
>>>>
>>>>
>>>> On Thu, Feb 8, 2018 at 12:32 AM, Чепайкин Михаил <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi
>>>>>
>>>>> I’ve changed configuration as you suggested:
>>>>>
>>>>> backend tsdb_backend_query
>>>>> server-template tsdb_query 5 _mfm-monitor-opentsdb._tcp.service.mfmconsul:4242 check resolvers dns inter 1000
>>>>>
>>>>> Logs are kinda different - backend servers now go UP and DOWN, but
>>>>> seems the same - ip addresses changing in the same way:
>>>>>
>>>>> time="2018-02-08T02:12:53+03:00" level=info msg="[WARNING] 038/021253 (18208) : Server tsdb_backend_query/tsdb_query1 is going DOWN for maintenance (No IP for server ). 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue." job=mfm-monitor-haproxy pid=18208
>>>>> time="2018-02-08T02:12:53+03:00" level=info msg="[WARNING] 038/021253 (18208) : tsdb_backend_query/tsdb_query1 changed its IP from 10.182.161.223 to 10.182.161.211 by DNS cache." job=mfm-monitor-haproxy pid=18208
>>>>> time="2018-02-08T02:12:53+03:00" level=info msg="[WARNING] 038/021253 (18208) : Server tsdb_backend_query/tsdb_query1 administratively READY thanks to valid DNS answer." job=mfm-monitor-haproxy pid=18208
>>>>> time="2018-02-08T02:12:53+03:00" level=info msg="[WARNING] 038/021253 (18208) : Server tsdb_backend_query/tsdb_query1 ('0ab6a1d3.addr.dc1.mfmconsul') is UP/READY (resolves again)." job=mfm-monitor-haproxy pid=18208
>>>>> time="2018-02-08T02:12:55+03:00" level=info msg="[WARNING] 038/021255 (18208) : Server tsdb_backend_query/tsdb_query3 is going DOWN for maintenance (No IP for server ). 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue." job=mfm-monitor-haproxy pid=18208
>>>>> time="2018-02-08T02:12:55+03:00" level=info msg="[WARNING] 038/021255 (18208) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182.161.98 to 10.182.161.223 by DNS cache." job=mfm-monitor-haproxy pid=18208
>>>>> time="2018-02-08T02:12:55+03:00" level=info msg="[WARNING] 038/021255 (18208) : Server tsdb_backend_query/tsdb_query3 administratively READY thanks to valid DNS answer." job=mfm-monitor-haproxy pid=18208
>>>>> time="2018-02-08T02:12:55+03:00" level=info msg="[WARNING] 038/021255 (18208) : Server tsdb_backend_query/tsdb_query3 ('0ab6a1df.addr.dc1.mfmconsul') is UP/READY (resolves again)." job=mfm-monitor-haproxy pid=18208
>>>>> time="2018-02-08T02:12:57+03:00" level=info msg="[WARNING] 038/021257 (18208) : Server tsdb_backend_query/tsdb_query3 is going DOWN for maintenance (No IP for server ). 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue." job=mfm-monitor-haproxy pid=18208
>>>>> time="2018-02-08T02:12:57+03:00" level=info msg="[WARNING] 038/021257 (18208) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182.161.223 to 10.182.161.98 by DNS cache." job=mfm-monitor-haproxy pid=18208
>>>>> time="2018-02-08T02:12:57+03:00" level=info msg="[WARNING] 038/021257 (18208) : Server tsdb_backend_query/tsdb_query3 administratively READY thanks to valid DNS answer." job=mfm-monitor-haproxy pid=18208
>>>>> time="2018-02-08T02:12:57+03:00" level=info msg="[WARNING] 038/021257 (18208) : Server tsdb_backend_query/tsdb_query3 ('0ab6a162.addr.dc1.mfmconsul') is UP/READY (resolves again)." job=mfm-monitor-haproxy pid=18208
>>>>> time="2018-02-08T02:13:01+03:00" level=info msg="[WARNING] 038/021301 (18208) : Server tsdb_backend_query/tsdb_query1 is going DOWN for maintenance (No IP for server ). 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue." job=mfm-monitor-haproxy pid=18208
>>>>> time="2018-02-08T02:13:01+03:00" level=info msg="[WARNING] 038/021301 (18208) : tsdb_backend_query/tsdb_query1 changed its IP from 10.182.161.211 to 10.182.161.223 by DNS cache." job=mfm-monitor-haproxy pid=18208
>>>>> time="2018-02-08T02:13:01+03:00" level=info msg="[WARNING] 038/021301 (18208) : Server tsdb_backend_query/tsdb_query1 administratively READY thanks to valid DNS answer." job=mfm-monitor-haproxy pid=18208
>>>>> time="2018-02-08T02:13:01+03:00" level=info msg="[WARNING] 038/021301 (18208) : Server tsdb_backend_query/tsdb_query1 ('0ab6a1df.addr.dc1.mfmconsul') is UP/READY (resolves again)." job=mfm-monitor-haproxy pid=18208
>>>>> time="2018-02-08T02:13:05+03:00" level=info msg="[WARNING] 038/021305 (18208) : Server tsdb_backend_query/tsdb_query2 is going DOWN for maintenance (No IP for server ). 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue." job=mfm-monitor-haproxy pid=18208
>>>>> time="2018-02-08T02:13:05+03:00" level=info msg="[WARNING] 038/021305 (18208) : tsdb_backend_query/tsdb_query2 changed its IP from 10.182.161.163 to 10.182.161.211 by DNS cache." job=mfm-monitor-haproxy pid=18208
>>>>>
>>>>> Any thoughts?
>>>>>
>>>>> On 8 February 2018 at 01:25, Baptiste <[email protected]> wrote:
>>>>>
>>>>> Hi
>>>>>>
>>>>>> You're not using SRV records and that may be the root cause of your
>>>>>> issue.
>>>>>> Please try something like this:
>>>>>>
>>>>>> backend tsdb_backend_query
>>>>>> server-template tsdb_query 5 _mfm-monitor-opentsdb._tcp.service.mfmconsul:4242 check resolvers dns inter 1000
>>>>>>
>>>>>> if "mfm-monitor-opentsdb" is your service name in consul.
>>>>>>
>>>>>> Baptiste
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Feb 7, 2018 at 2:52 PM, Чепайкин Михаил <[email protected]
>>>>>> > wrote:
>>>>>>
>>>>>>> Hi!
>>>>>>>
>>>>>>> I have a Consul as service discovery tool and HAProxy as load
>>>>>>> balancer.
>>>>>>>
>>>>>>> In Consul registered a service running on a number of servers, and
>>>>>>> this service can be scaled by adding and removing nodes and by moving nodes
>>>>>>> from one server to another.
>>>>>>>
>>>>>>> Consul has DNS service which randomizes responses for services like
>>>>>>> that:
>>>>>>>
>>>>>>> [bux] [email protected]:~$ dig +short mfm-monitor-opentsdb.service.mfmconsul
>>>>>>> 10.182.161.239
>>>>>>> 10.182.161.152
>>>>>>> 10.182.161.240
>>>>>>> 10.182.161.92
>>>>>>> [bux] [email protected]:~$ dig +short mfm-monitor-opentsdb.service.mfmconsul
>>>>>>> 10.182.161.92
>>>>>>> 10.182.161.152
>>>>>>> 10.182.161.240
>>>>>>> 10.182.161.239
>>>>>>>
>>>>>>> In HAProxy 1.8.3 im using server-template configuration, like that:
>>>>>>>
>>>>>>> resolvers dns
>>>>>>> nameserver dns1 ${HAPROXY_NAMESERVER}
>>>>>>> hold valid 2s
>>>>>>>
>>>>>>> backend tsdb_backend_query
>>>>>>> server-template tsdb_query 5 mfm-monitor-opentsdb.service.mfmconsul:4242 check resolvers dns inter 1000
>>>>>>>
>>>>>>> And in that case I get alot of warinings in haproxy log:
>>>>>>>
>>>>>>> time="2018-02-02T15:44:32+03:00" level=info msg="[WARNING] 032/154432 (32983) : tsdb_backend_query/tsdb_query1 changed its IP from 10.182.161.240 to 10.182.161.239 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>>>>> time="2018-02-02T15:44:42+03:00" level=info msg="[WARNING] 032/154442 (32983) : tsdb_backend_query/tsdb_query1 changed its IP from 10.182.161.239 to 10.182.161.240 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>>>>> time="2018-02-02T15:44:46+03:00" level=info msg="[WARNING] 032/154446 (32983) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182.161.152 to 10.182.161.239 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>>>>> time="2018-02-02T15:44:50+03:00" level=info msg="[WARNING] 032/154450 (32983) : tsdb_backend_query/tsdb_query2 changed its IP from 10.182.161.92 to 10.182.161.152 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>>>>> time="2018-02-02T15:44:52+03:00" level=info msg="[WARNING] 032/154452 (32983) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182.161.239 to 10.182.161.92 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>>>>> time="2018-02-02T15:44:56+03:00" level=info msg="[WARNING] 032/154456 (32983) : tsdb_backend_query/tsdb_query1 changed its IP from 10.182.161.240 to 10.182.161.239 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>>>>> time="2018-02-02T15:45:00+03:00" level=info msg="[WARNING] 032/154500 (32983) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182.161.92 to 10.182.161.240 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>>>>> time="2018-02-02T15:45:02+03:00" level=info msg="[WARNING] 032/154502 (32983) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182.161.240 to 10.182.161.92 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>>>>> time="2018-02-02T15:45:04+03:00" level=info msg="[WARNING] 032/154504 (32983) : tsdb_backend_query/tsdb_query2 changed its IP from 10.182.161.152 to 10.182.161.240 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>>>>> time="2018-02-02T15:45:06+03:00" level=info msg="[WARNING] 032/154506 (32983) : tsdb_backend_query/tsdb_query1 changed its IP from 10.182.161.239 to 10.182.161.152 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>>>>> time="2018-02-02T15:45:10+03:00" level=info msg="[WARNING] 032/154510 (32983) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182.161.92 to 10.182.161.239 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>>>>> time="2018-02-02T15:45:18+03:00" level=info msg="[WARNING] 032/154518 (32983) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182.161.239 to 10.182.161.92 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>>>>> time="2018-02-02T15:45:20+03:00" level=info msg="[WARNING] 032/154520 (32983) : tsdb_backend_query/tsdb_query2 changed its IP from 10.182.161.240 to 10.182.161.239 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>>>>>
>>>>>>> This isn’t really break the service, but I think this is not quite
>>>>>>> normal.
>>>>>>>
>>>>>>> Any advise on how to resolve this issue?
>>>>>>>
>>>>>>
>>> --
>>> Mike Chepaykin
>>>
>>>
>>
>
Baptiste
Re: Server-template and randomized DNS responses
February 12, 2018 09:50PM
To share the solution with everyone, the problem was fixed by a
configuration update.
Mike added a "accepted_payload_size 1024" into his resolvers section.

HAProxy announces by an accepted payload of 512 bytes, which let the place
for only 3 records reported by consul.
With a payload of 1024, up to 10 servers can be reported by consul, which
is large enough for Mike.

HAProxy can announce up to 8K of accpeted DNS payload , that said, while
troubleshooting Mike's case, I found a bug where HAProxy reduces itself
automatically to 1280 bytes under some conditions.
This bug is not related to Mike's case, but deserves a fix. I'll work on it
asap.

Baptiste


On Mon, Feb 12, 2018 at 10:17 AM, Baptiste <[email protected]> wrote:

> Continuing on my investigation I found an other interesting piece of
> information:
> I run haproxy and my consul environment in a docker host, through
> docker-compose and I can reproduce the same issue as you.
> Basically, I have a service delivered by 20 containers, and HAProxy in
> docker can see only 10 of them and switches all their IPs all the time...
> That said, if I run the same HAProxy binary on my laptop, pointing it's
> DNS resolvers to the consul client running in my docker host, everything
> works smoothly!!!
>
> In my case, there is one thing that might happen: docker drops too big DNS
> responses (UDP) and my HAProxy failover to 512 bytes only where only 10 SRV
> records could stand (consul also returns A and TXT records for each SRV
> response).
>
> I tested both latest 1.8 and 1.9-dev and can report same issue in both
> cases.
>
> Could you tell me more about your environment (drop the ML if there are
> too many sensitive information)
>
> Baptiste
>
>
> On Mon, Feb 12, 2018 at 9:25 AM, Baptiste <[email protected]> wrote:
>
>> First, I confirm the following bug in consul 1.0.5:
>> - start a X instances of a service
>> - scale the service to X+Y (with Y > 1)
>> ==> then consul crashes...
>> From time to time, I also saw HAProxy getting only 10 servers from 20 for
>> a given service.
>>
>> I'll revert to 1.0.2 for now.
>>
>> The order of the returned SRV records is ignored by HAProxy.
>> Can you confirm the number of servers associated to the service '
>> mfm-monitor-opentsdb' in consul?
>> On the HAProxy box, can you run the following command and return the
>> output (obfuscating the IPs and other sensible information)
>> dig +notcp @127.0.0.1 -p 8600 -t SRV _mfm-monitor-opentsdb._tcp.ser
>> vice.consul
>>
>> Baptiste
>>
>>
>>
>> On Mon, Feb 12, 2018 at 8:27 AM, Чепайкин Михаил <[email protected]>
>> wrote:
>>
>>> Im on Consul 1.0.2.
>>>
>>> Why do you think this issue is about serving SRV over UDP, rather than
>>> about different order of SRV or A records returned by Consul DNS with
>>> consecutive requests?
>>>
>>> On 11 February 2018 at 18:46, Baptiste <[email protected]> wrote:
>>>
>>>> Hi,
>>>>
>>>> What consul version are you using?
>>>> I'm facing the same issue in my consul lab. That said, it seems to be a
>>>> bug in consul, not able to serve too many SRV records over UDP.
>>>> I even triggered a consul crash (using 1.0.5 version).
>>>> I'm still investigating this issue and will come back to you as soon as
>>>> I have more reliable information.
>>>>
>>>> Note: please ensure the number of server created by server-template
>>>> directive (5 in your case) is above the expected number of server available
>>>> in your service.
>>>>
>>>> Baptiste
>>>>
>>>>
>>>>
>>>> On Thu, Feb 8, 2018 at 12:32 AM, Чепайкин Михаил <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi
>>>>>
>>>>> I’ve changed configuration as you suggested:
>>>>>
>>>>> backend tsdb_backend_query
>>>>> server-template tsdb_query 5 _mfm-monitor-opentsdb._tcp.service.mfmconsul:4242 check resolvers dns inter 1000
>>>>>
>>>>> Logs are kinda different - backend servers now go UP and DOWN, but
>>>>> seems the same - ip addresses changing in the same way:
>>>>>
>>>>> time="2018-02-08T02:12:53+03:00" level=info msg="[WARNING] 038/021253 (18208) : Server tsdb_backend_query/tsdb_query1 is going DOWN for maintenance (No IP for server ). 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue." job=mfm-monitor-haproxy pid=18208
>>>>> time="2018-02-08T02:12:53+03:00" level=info msg="[WARNING] 038/021253 (18208) : tsdb_backend_query/tsdb_query1 changed its IP from 10.182.161.223 to 10.182.161.211 by DNS cache." job=mfm-monitor-haproxy pid=18208
>>>>> time="2018-02-08T02:12:53+03:00" level=info msg="[WARNING] 038/021253 (18208) : Server tsdb_backend_query/tsdb_query1 administratively READY thanks to valid DNS answer." job=mfm-monitor-haproxy pid=18208
>>>>> time="2018-02-08T02:12:53+03:00" level=info msg="[WARNING] 038/021253 (18208) : Server tsdb_backend_query/tsdb_query1 ('0ab6a1d3.addr.dc1.mfmconsul') is UP/READY (resolves again)." job=mfm-monitor-haproxy pid=18208
>>>>> time="2018-02-08T02:12:55+03:00" level=info msg="[WARNING] 038/021255 (18208) : Server tsdb_backend_query/tsdb_query3 is going DOWN for maintenance (No IP for server ). 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue." job=mfm-monitor-haproxy pid=18208
>>>>> time="2018-02-08T02:12:55+03:00" level=info msg="[WARNING] 038/021255 (18208) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182.161.98 to 10.182.161.223 by DNS cache." job=mfm-monitor-haproxy pid=18208
>>>>> time="2018-02-08T02:12:55+03:00" level=info msg="[WARNING] 038/021255 (18208) : Server tsdb_backend_query/tsdb_query3 administratively READY thanks to valid DNS answer." job=mfm-monitor-haproxy pid=18208
>>>>> time="2018-02-08T02:12:55+03:00" level=info msg="[WARNING] 038/021255 (18208) : Server tsdb_backend_query/tsdb_query3 ('0ab6a1df.addr.dc1.mfmconsul') is UP/READY (resolves again)." job=mfm-monitor-haproxy pid=18208
>>>>> time="2018-02-08T02:12:57+03:00" level=info msg="[WARNING] 038/021257 (18208) : Server tsdb_backend_query/tsdb_query3 is going DOWN for maintenance (No IP for server ). 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue." job=mfm-monitor-haproxy pid=18208
>>>>> time="2018-02-08T02:12:57+03:00" level=info msg="[WARNING] 038/021257 (18208) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182.161.223 to 10.182.161.98 by DNS cache." job=mfm-monitor-haproxy pid=18208
>>>>> time="2018-02-08T02:12:57+03:00" level=info msg="[WARNING] 038/021257 (18208) : Server tsdb_backend_query/tsdb_query3 administratively READY thanks to valid DNS answer." job=mfm-monitor-haproxy pid=18208
>>>>> time="2018-02-08T02:12:57+03:00" level=info msg="[WARNING] 038/021257 (18208) : Server tsdb_backend_query/tsdb_query3 ('0ab6a162.addr.dc1.mfmconsul') is UP/READY (resolves again)." job=mfm-monitor-haproxy pid=18208
>>>>> time="2018-02-08T02:13:01+03:00" level=info msg="[WARNING] 038/021301 (18208) : Server tsdb_backend_query/tsdb_query1 is going DOWN for maintenance (No IP for server ). 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue." job=mfm-monitor-haproxy pid=18208
>>>>> time="2018-02-08T02:13:01+03:00" level=info msg="[WARNING] 038/021301 (18208) : tsdb_backend_query/tsdb_query1 changed its IP from 10.182.161.211 to 10.182.161.223 by DNS cache." job=mfm-monitor-haproxy pid=18208
>>>>> time="2018-02-08T02:13:01+03:00" level=info msg="[WARNING] 038/021301 (18208) : Server tsdb_backend_query/tsdb_query1 administratively READY thanks to valid DNS answer." job=mfm-monitor-haproxy pid=18208
>>>>> time="2018-02-08T02:13:01+03:00" level=info msg="[WARNING] 038/021301 (18208) : Server tsdb_backend_query/tsdb_query1 ('0ab6a1df.addr.dc1.mfmconsul') is UP/READY (resolves again)." job=mfm-monitor-haproxy pid=18208
>>>>> time="2018-02-08T02:13:05+03:00" level=info msg="[WARNING] 038/021305 (18208) : Server tsdb_backend_query/tsdb_query2 is going DOWN for maintenance (No IP for server ). 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue." job=mfm-monitor-haproxy pid=18208
>>>>> time="2018-02-08T02:13:05+03:00" level=info msg="[WARNING] 038/021305 (18208) : tsdb_backend_query/tsdb_query2 changed its IP from 10.182.161.163 to 10.182.161.211 by DNS cache." job=mfm-monitor-haproxy pid=18208
>>>>>
>>>>> Any thoughts?
>>>>>
>>>>> On 8 February 2018 at 01:25, Baptiste <[email protected]> wrote:
>>>>>
>>>>> Hi
>>>>>>
>>>>>> You're not using SRV records and that may be the root cause of your
>>>>>> issue.
>>>>>> Please try something like this:
>>>>>>
>>>>>> backend tsdb_backend_query
>>>>>> server-template tsdb_query 5 _mfm-monitor-opentsdb._tcp.service.mfmconsul:4242 check resolvers dns inter 1000
>>>>>>
>>>>>> if "mfm-monitor-opentsdb" is your service name in consul.
>>>>>>
>>>>>> Baptiste
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Feb 7, 2018 at 2:52 PM, Чепайкин Михаил <[email protected]
>>>>>> > wrote:
>>>>>>
>>>>>>> Hi!
>>>>>>>
>>>>>>> I have a Consul as service discovery tool and HAProxy as load
>>>>>>> balancer.
>>>>>>>
>>>>>>> In Consul registered a service running on a number of servers, and
>>>>>>> this service can be scaled by adding and removing nodes and by moving nodes
>>>>>>> from one server to another.
>>>>>>>
>>>>>>> Consul has DNS service which randomizes responses for services like
>>>>>>> that:
>>>>>>>
>>>>>>> [bux] [email protected]:~$ dig +short mfm-monitor-opentsdb.service.mfmconsul
>>>>>>> 10.182.161.239
>>>>>>> 10.182.161.152
>>>>>>> 10.182.161.240
>>>>>>> 10.182.161.92
>>>>>>> [bux] [email protected]:~$ dig +short mfm-monitor-opentsdb.service.mfmconsul
>>>>>>> 10.182.161.92
>>>>>>> 10.182.161.152
>>>>>>> 10.182.161.240
>>>>>>> 10.182.161.239
>>>>>>>
>>>>>>> In HAProxy 1.8.3 im using server-template configuration, like that:
>>>>>>>
>>>>>>> resolvers dns
>>>>>>> nameserver dns1 ${HAPROXY_NAMESERVER}
>>>>>>> hold valid 2s
>>>>>>>
>>>>>>> backend tsdb_backend_query
>>>>>>> server-template tsdb_query 5 mfm-monitor-opentsdb.service.mfmconsul:4242 check resolvers dns inter 1000
>>>>>>>
>>>>>>> And in that case I get alot of warinings in haproxy log:
>>>>>>>
>>>>>>> time="2018-02-02T15:44:32+03:00" level=info msg="[WARNING] 032/154432 (32983) : tsdb_backend_query/tsdb_query1 changed its IP from 10.182.161.240 to 10.182.161.239 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>>>>> time="2018-02-02T15:44:42+03:00" level=info msg="[WARNING] 032/154442 (32983) : tsdb_backend_query/tsdb_query1 changed its IP from 10.182.161.239 to 10.182.161.240 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>>>>> time="2018-02-02T15:44:46+03:00" level=info msg="[WARNING] 032/154446 (32983) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182.161.152 to 10.182.161.239 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>>>>> time="2018-02-02T15:44:50+03:00" level=info msg="[WARNING] 032/154450 (32983) : tsdb_backend_query/tsdb_query2 changed its IP from 10.182.161.92 to 10.182.161.152 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>>>>> time="2018-02-02T15:44:52+03:00" level=info msg="[WARNING] 032/154452 (32983) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182.161.239 to 10.182.161.92 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>>>>> time="2018-02-02T15:44:56+03:00" level=info msg="[WARNING] 032/154456 (32983) : tsdb_backend_query/tsdb_query1 changed its IP from 10.182.161.240 to 10.182.161.239 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>>>>> time="2018-02-02T15:45:00+03:00" level=info msg="[WARNING] 032/154500 (32983) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182.161.92 to 10.182.161.240 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>>>>> time="2018-02-02T15:45:02+03:00" level=info msg="[WARNING] 032/154502 (32983) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182.161.240 to 10.182.161.92 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>>>>> time="2018-02-02T15:45:04+03:00" level=info msg="[WARNING] 032/154504 (32983) : tsdb_backend_query/tsdb_query2 changed its IP from 10.182.161.152 to 10.182.161.240 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>>>>> time="2018-02-02T15:45:06+03:00" level=info msg="[WARNING] 032/154506 (32983) : tsdb_backend_query/tsdb_query1 changed its IP from 10.182.161.239 to 10.182.161.152 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>>>>> time="2018-02-02T15:45:10+03:00" level=info msg="[WARNING] 032/154510 (32983) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182.161.92 to 10.182.161.239 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>>>>> time="2018-02-02T15:45:18+03:00" level=info msg="[WARNING] 032/154518 (32983) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182.161.239 to 10.182.161.92 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>>>>> time="2018-02-02T15:45:20+03:00" level=info msg="[WARNING] 032/154520 (32983) : tsdb_backend_query/tsdb_query2 changed its IP from 10.182.161.240 to 10.182.161.239 by DNS cache." job=mfm-monitor-haproxy pid=32983
>>>>>>>
>>>>>>> This isn’t really break the service, but I think this is not quite
>>>>>>> normal.
>>>>>>>
>>>>>>> Any advise on how to resolve this issue?
>>>>>>>
>>>>>>
>>> --
>>> Mike Chepaykin
>>>
>>>
>>
>
Sorry, only registered users may post in this forum.

Click here to login