Welcome! Log In Create A New Profile

Advanced

Testing Stick table replication with snapshot 20120222

Posted by Mark Brooks 
Mark Brooks
Testing Stick table replication with snapshot 20120222
February 22, 2012 02:40PM
Hi all,

We have been testing with stick table replication and was wondering if
we could get some clarification on its operation/possibly make a
feature request, if what we think is happening happens.

Our configuration is as follows -


global
daemon
stats socket /var/run/haproxy.stat mode 600 level admin
pidfile /var/run/haproxy.pid
maxconn 40000
ulimit-n 81000
defaults
mode http
balance roundrobin
timeout connect 4000
timeout client 42000
timeout server 43000
peers loadbalancer_replication
peer instance1 192.168.66.94:7778
peer instance2 192.168.66.95:7778
listen VIP_Name
bind 192.100.1.2:80
mode tcp
balance leastconn
server backup 127.0.0.1:9081 backup non-stick
stick-table type ip size 10240k expire 30m peers
loadbalancer_replication
stick on src
option redispatch
option abortonclose
maxconn 40000
server RIP_Name 192.168.66.50 weight 1 check port 80 inter
2000 rise 2 fall 3 minconn 0 maxconn 0 on-marked-down
shutdown-sessions
server RIP_Name-1 192.168.66.51:80 weight 1 check inter
2000 rise 2 fall 3 minconn 0 maxconn 0 on-marked-down
shutdown-sessions



I have replication working between the devices, our issues come when
one of the nodes is lost and brought back online.
For example -

We have 2 copies of haproxy running on 2 machines called instance 1
and instance 2

Starting setup
instance 1's persistence table
entry 1
entry 2
entry 3

instance 2's persistence table
entry 1
entry 2
entry 3


instance 1 now fails and is no longer communicating with instance 2.
All the users are now connected to instance 2.

Now instance 1 is brought back online.

The users are still connecting to instance 2, but the persistence
table entries for instance 2 are only copied to instance one if the
connection is re-established (we see it as the persistence timeout
counter resetting).

So you can end up with

instance 1's persistence table
entry 1


instance 2's persistence table
entry 1
entry 2
entry 3

If you were to then cause the connections to switch over from instance
1 to instance 2, you would be missing 2 persistence entries.

Is this expected behaviour?

If it is, would it be possible to request a feature for a socket
command of some sort which you can run on a device which force
synchronises the persistence table with the other peers? So which ever
instance it is run on it takes its persistence table and pushes it to
the other peers.


Mark
Hey,

Why not using "balance source" instead of using stick tables to do ip
source affinity?

Note that the behavior you're observing is not a bug, it's by design
:) There is no master on the table.
Each process will only push "writes" to its peers.
So an improvement could be to give the ability to HAProxy to push a
table content to a peer, or at least, if a peer has no entry in its
table for a request, then ask its peer if they have one.
but this would slow down your traffic.

Before you ask for it, be aware that only the IPs and the affected
server are synchroniszed in tables. No counters are sync.

cheers


On Wed, Feb 22, 2012 at 2:32 PM, Mark Brooks <[email protected]> wrote:
> Hi all,
>
> We have been testing with stick table replication and was wondering if
> we could get some clarification on its operation/possibly make a
> feature request, if what we think is happening happens.
>
> Our configuration is as follows -
>
>
> global
>        daemon
>        stats socket /var/run/haproxy.stat mode 600 level admin
>        pidfile /var/run/haproxy.pid
>        maxconn 40000
>        ulimit-n 81000
> defaults
>        mode http
>        balance roundrobin
>        timeout connect 4000
>        timeout client 42000
>        timeout server 43000
> peers loadbalancer_replication
>        peer instance1 192.168.66.94:7778
>        peer instance2 192.168.66.95:7778
> listen VIP_Name
>        bind 192.100.1.2:80
>        mode tcp
>        balance leastconn
>        server backup 127.0.0.1:9081 backup  non-stick
>        stick-table type ip size 10240k expire 30m peers
> loadbalancer_replication
>        stick on src
>        option redispatch
>        option abortonclose
>        maxconn 40000
>        server RIP_Name 192.168.66.50  weight 1  check port 80  inter
> 2000  rise 2  fall 3 minconn 0  maxconn 0 on-marked-down
> shutdown-sessions
>        server RIP_Name-1 192.168.66.51:80  weight 1  check   inter
> 2000  rise 2  fall 3 minconn 0  maxconn 0 on-marked-down
> shutdown-sessions
>
>
>
> I have replication working between the devices, our issues come when
> one of the nodes is lost and brought back online.
> For example -
>
> We have 2 copies of haproxy running on 2 machines called instance 1
> and instance 2
>
> Starting setup
> instance 1's persistence table
> entry 1
> entry 2
> entry 3
>
> instance 2's persistence table
> entry 1
> entry 2
> entry 3
>
>
> instance 1 now fails and is no longer communicating with instance 2.
> All the users are now connected to instance 2.
>
> Now instance 1 is brought back online.
>
> The users are still connecting to instance 2, but the persistence
> table entries for instance 2 are only copied to instance one if the
> connection is re-established (we see it as the persistence timeout
> counter resetting).
>
> So you can end up with
>
> instance 1's persistence table
> entry 1
>
>
> instance 2's persistence table
> entry 1
> entry 2
> entry 3
>
> If you were to then cause the connections to switch over from instance
> 1 to instance 2, you would be missing 2 persistence entries.
>
> Is this expected behaviour?
>
> If it is, would it be possible to request a feature for a socket
> command of some sort which you can run on a device which force
> synchronises the persistence table with the other peers? So which ever
> instance it is run on it takes its persistence table and pushes it to
> the other peers.
>
>
> Mark
>
Willy Tarreau
Re: Testing Stick table replication with snapshot 20120222
February 23, 2012 08:30AM
Hi,

On Wed, Feb 22, 2012 at 05:54:48PM +0100, Baptiste wrote:
> Hey,
>
> Why not using "balance source" instead of using stick tables to do ip
> source affinity?

The main difference between "balance source" and "stick on src" is that
with the former, when you lose a server, all clients are redistributed
while in the second only the clients attached to the failed server are
redistributed. Also when the failed server comes back, clients are moved
again with "balance source". "balance source" + "hash-type consistent"
at least fixes the first issue but the second one remains. I'm not fond of
stick tables at all, but I must admit they address real world issues :-)

> Note that the behavior you're observing is not a bug, it's by design
> :) There is no master on the table.

Upon restart, there is a special state where the new peer connects to
other ones and asks them to dump all of their tables contents. So this
issue should not happen at all or it's a bug. We already observed this
behaviour during the development of the feature, but it's never been
observed since it was released. Maybe we recently broke something. Mark,
what version are you using ? Do you have any patches applied ?

Regards,
Willy
Mark Brooks
Re: Testing Stick table replication with snapshot 20120222
February 23, 2012 04:50PM
On 23 February 2012 07:20, Willy Tarreau <[email protected]> wrote:
> Hi,
>
> On Wed, Feb 22, 2012 at 05:54:48PM +0100, Baptiste wrote:
>> Hey,
>>
>> Why not using "balance source" instead of using stick tables to do ip
>> source affinity?
>
> The main difference between "balance source" and "stick on src" is that
> with the former, when you lose a server, all clients are redistributed
> while in the second only the clients attached to the failed server are
> redistributed. Also when the failed server comes back, clients are moved
> again with "balance source". "balance source" + "hash-type consistent"
> at least fixes the first issue but the second one remains. I'm not fond of
> stick tables at all, but I must admit they address real world issues :-)
>
>> Note that the behavior you're observing is not a bug, it's by design
>> :) There is no master on the table.
>
> Upon restart, there is a special state where the new peer connects to
> other ones and asks them to dump all of their tables contents. So this
> issue should not happen at all or it's a bug. We already observed this
> behaviour during the development of the feature, but it's never been
> observed since it was released. Maybe we recently broke something. Mark,
> what version are you using ? Do you have any patches applied ?
>
> Regards,
> Willy
>

Thanks Willy, We have re-tested the replication across haproxy
reload/restart and it appears it was working as you suggested. So
apologies there.

We have seen that when restarting or reloading the table syncs between
2 processes on the same box and also when it syncs to a remote peer
that the persistence timeout counter is reset to the maximum value and
not carried with it.

Is it possible to request the persistence timeout entries counters
sync across this restart/reload?

It has however raised another question - How best to clear the tables
on all appliances at the same time.

Say the devices were out of sync or there was a problem somewhere
within it users getting directed to the wrong place resulting in a
requirement to clear the tables and start again.

We could use the clear table socat command but that only clears on one
device at a time, so you could end up with the state where you clear
instance1 then clear instance2 but during the time between clearing
instance1 and clearing instance2, some new users connected to
instance1. So when you clear instance2 those entries will not be
synchronised. This would be particularly obvious if you were using
something with a long connection time for example an RDP session. So
say instance 1 were to fail again, you would not have all entries for
instance1 in instance2's persistence table.

The only thing we have been able to come up with so far is to put each
of the backend servers in maintenance mode first so they stop
accepting new connections then clear the tables then bring them back
on-line again.

Do you have a neater method of clearing the tables without having to
block users access?

Mark
Willy Tarreau
Re: Testing Stick table replication with snapshot 20120222
March 10, 2012 02:10PM
Hi Mark,

On Thu, Feb 23, 2012 at 03:48:08PM +0000, Mark Brooks wrote:
> Thanks Willy, We have re-tested the replication across haproxy
> reload/restart and it appears it was working as you suggested. So
> apologies there.

You don't have to apologize, you might have encountered a real bug which
only appears once in a while. As I often say, reporting uncertain bugs
is better than nothing, at least it can suggest other people to report
a "me too".

Also, at Exceliance during some preliminary native-SSL tests, one of our
engineers noticed a bug which could possibly affect peers replication
after some error scenarios occur. It looks like if some errors happen
on the connection after a full replication, next connections will not
necessarily restart replication. It might be what you've observed. The
fix has been pushed into the master tree and I'm planning on a -dev8
next week as enough fixes are stacked there.

> We have seen that when restarting or reloading the table syncs between
> 2 processes on the same box and also when it syncs to a remote peer
> that the persistence timeout counter is reset to the maximum value and
> not carried with it.
> Is it possible to request the persistence timeout entries counters
> sync across this restart/reload?

No, timers are not exchanged, only the server ID. A number of other things
would need to be synced (eg: counters, etc...) but that's still quite
difficult to do, so for now sessions are refreshed upon synchronization
just as if there was activity on them.

> It has however raised another question - How best to clear the tables
> on all appliances at the same time.

I unfortunately have no solution to this problem right now and I know
for sure that it can be annoying sometimes. It's not even haproxy-specific,
it's a general problem of how to make an information disappear from a global
system when it's replicated in real time and you can only destroy it on a
single node at a time. Some solutions would possibly involve sending deletion
orders to other nodes or just updating their expiration timer, I don't know
for now. I think that it will be easier or at least less critical when the
expiration timers are shared!

(...)
> The only thing we have been able to come up with so far is to put each
> of the backend servers in maintenance mode first so they stop
> accepting new connections then clear the tables then bring them back
> on-line again.

I think you could proceed differently : break the replication between
the nodes, clear all tables then reopen replication. At least it would
not block user access nor traffic.

Regards,
Willy
Sorry, only registered users may post in this forum.

Click here to login