Problem with roundrobbin failover mode?

newer
4.53 32-bit on Windows 2k8 R2 64bit

Matt Wise

7 Jan 2013 7 Jan '13

5:38 p.m.

I've got dozens of clients connecting with Stunnel to a group of 5 servers. Each system has a config that looks like this:

...

cert = /etc/stunnel/zookeeper.pem key = /etc/stunnel/zookeeper.key CAfile = /etc/stunnel/zookeeper_ca.pem verify = 2 delay = yes sslVersion = TLSv1 client = yes setuid = stunnel4 setgid = stunnel4 pid = /var/lib/stunnel4/zookeeper.stunnel4.pid socket = l:TCP_NODELAY=1 socket = r:TCP_NODELAY=1 TIMEOUTconnect = 2 session = 86400 debug = 5 [zookeeper] accept = 127.0.0.1:2182 failover = rr connect = prod-zookeeper:2182 connect = prod-zookeeper-1:2182 connect = prod-zookeeper-2:2182 connect = prod-zookeeper-3:2182 connect = prod-zookeeper-4:2182 connect = prod-zookeeper-5:2182

Essentially the first host is a load balancer, and the next 5 are the actual zookeeper hosts so that we can bypass the ELB if its giving us fits. Now what we're seeing is that almost every connection ends up on prod-zookeeper-5. Over and over and over again, our hosts pick the same system each time. We're running Stunnel 4.52:

...

Clients allowed=8000 stunnel 4.52 on i486-pc-linux-gnu platform Compiled/running with OpenSSL 0.9.8k 25 Mar 2009 Threading:PTHREAD SSL:ENGINE Auth:LIBWRAP Sockets:POLL,IPv6

Any ideas what might be wrong here? Obviously we want the connections to be *roughly* random across the list of hosts... and if one of the hosts goes down, and the connection fails, we want the stunnel service to try again, and randomly pick a new host. It doesn't really seem to be doing that though. --Matt

Show replies by date

Brian Wilkins

7 Jan 7 Jan

6:21 p.m.

I am pretty sure stunnel uses the final connect string as the connect host. Round robin only works if dns returns multiple addresses. There was a user patch a while ago that provided a different way. On Jan 7, 2013 12:39 PM, "Matt Wise" <matt@nextdoor.com> wrote:

...

I've got dozens of clients connecting with Stunnel to a group of 5 servers. Each system has a config that looks like this:

...
cert = /etc/stunnel/zookeeper.pem key = /etc/stunnel/zookeeper.key CAfile = /etc/stunnel/zookeeper_ca.pem verify = 2 delay = yes sslVersion = TLSv1 client = yes setuid = stunnel4 setgid = stunnel4 pid = /var/lib/stunnel4/zookeeper.stunnel4.pid socket = l:TCP_NODELAY=1 socket = r:TCP_NODELAY=1 TIMEOUTconnect = 2 session = 86400 debug = 5 [zookeeper] accept = 127.0.0.1:2182 failover = rr connect = prod-zookeeper:2182 connect = prod-zookeeper-1:2182 connect = prod-zookeeper-2:2182 connect = prod-zookeeper-3:2182 connect = prod-zookeeper-4:2182 connect = prod-zookeeper-5:2182

Essentially the first host is a load balancer, and the next 5 are the actual zookeeper hosts so that we can bypass the ELB if its giving us fits. Now what we're seeing is that almost every connection ends up on prod-zookeeper-5. Over and over and over again, our hosts pick the same system each time. We're running Stunnel 4.52:

...
Clients allowed=8000 stunnel 4.52 on i486-pc-linux-gnu platform Compiled/running with OpenSSL 0.9.8k 25 Mar 2009 Threading:PTHREAD SSL:ENGINE Auth:LIBWRAP Sockets:POLL,IPv6

Any ideas what might be wrong here? Obviously we want the connections to be *roughly* random across the list of hosts... and if one of the hosts goes down, and the connection fails, we want the stunnel service to try again, and randomly pick a new host. It doesn't really seem to be doing that though.

--Matt

_______________________________________________ stunnel-users mailing list stunnel-users@stunnel.org https://www.stunnel.org/cgi-bin/mailman/listinfo/stunnel-users

Matt Wise

6:28 p.m.

That's really odd because the man page states that you can use multiple connect statements. *connect = address* connect to a remote address If no host is specified, the host defaults to localhost. Multiple *connect* options are allowed in a single service section. If host resolves to multiple addresses and/or if multiple *connect* options are specified, then the remote address is chosen using a round-robin algorithm. However, I do think we're seeing the behavior you mentioned... Sent from my iPad On Jan 7, 2013, at 10:21 AM, Brian Wilkins <bwilkins@gmail.com> wrote: I am pretty sure stunnel uses the final connect string as the connect host. Round robin only works if dns returns multiple addresses. There was a user patch a while ago that provided a different way. On Jan 7, 2013 12:39 PM, "Matt Wise" <matt@nextdoor.com> wrote:

...

I've got dozens of clients connecting with Stunnel to a group of 5 servers. Each system has a config that looks like this:

...
cert = /etc/stunnel/zookeeper.pem key = /etc/stunnel/zookeeper.key CAfile = /etc/stunnel/zookeeper_ca.pem verify = 2 delay = yes sslVersion = TLSv1 client = yes setuid = stunnel4 setgid = stunnel4 pid = /var/lib/stunnel4/zookeeper.stunnel4.pid socket = l:TCP_NODELAY=1 socket = r:TCP_NODELAY=1 TIMEOUTconnect = 2 session = 86400 debug = 5 [zookeeper] accept = 127.0.0.1:2182 failover = rr connect = prod-zookeeper:2182 connect = prod-zookeeper-1:2182 connect = prod-zookeeper-2:2182 connect = prod-zookeeper-3:2182 connect = prod-zookeeper-4:2182 connect = prod-zookeeper-5:2182

Essentially the first host is a load balancer, and the next 5 are the actual zookeeper hosts so that we can bypass the ELB if its giving us fits. Now what we're seeing is that almost every connection ends up on prod-zookeeper-5. Over and over and over again, our hosts pick the same system each time. We're running Stunnel 4.52:

...
Clients allowed=8000 stunnel 4.52 on i486-pc-linux-gnu platform Compiled/running with OpenSSL 0.9.8k 25 Mar 2009 Threading:PTHREAD SSL:ENGINE Auth:LIBWRAP Sockets:POLL,IPv6

Any ideas what might be wrong here? Obviously we want the connections to be *roughly* random across the list of hosts... and if one of the hosts goes down, and the connection fails, we want the stunnel service to try again, and randomly pick a new host. It doesn't really seem to be doing that though.

--Matt

_______________________________________________ stunnel-users mailing list stunnel-users@stunnel.org https://www.stunnel.org/cgi-bin/mailman/listinfo/stunnel-users

Brian Wilkins

7:08 p.m.

Yes, but the keyword is resolve to multiple return IP addresses. I don't think your setup is doing that. If it is, then I am wrong and there is a bug. On Monday, January 7, 2013, Matt Wise wrote:

...

That's really odd because the man page states that you can use multiple connect statements.

*connect = address*

connect to a remote address

If no host is specified, the host defaults to localhost.

Multiple *connect* options are allowed in a single service section.

If host resolves to multiple addresses and/or if multiple *connect* options are specified, then the remote address is chosen using a round-robin algorithm.

However, I do think we're seeing the behavior you mentioned...

Sent from my iPad

On Jan 7, 2013, at 10:21 AM, Brian Wilkins <bwilkins@gmail.com<javascript:_e({}, 'cvml', 'bwilkins@gmail.com');>> wrote:

I am pretty sure stunnel uses the final connect string as the connect host. Round robin only works if dns returns multiple addresses. There was a user patch a while ago that provided a different way. On Jan 7, 2013 12:39 PM, "Matt Wise" <matt@nextdoor.com<javascript:_e({}, 'cvml', 'matt@nextdoor.com');>> wrote:

...
I've got dozens of clients connecting with Stunnel to a group of 5 servers. Each system has a config that looks like this:

...
cert = /etc/stunnel/zookeeper.pem key = /etc/stunnel/zookeeper.key CAfile = /etc/stunnel/zookeeper_ca.pem verify = 2 delay = yes sslVersion = TLSv1 client = yes setuid = stunnel4 setgid = stunnel4 pid = /var/lib/stunnel4/zookeeper.stunnel4.pid socket = l:TCP_NODELAY=1 socket = r:TCP_NODELAY=1 TIMEOUTconnect = 2 session = 86400 debug = 5 [zookeeper] accept = 127.0.0.1:2182 failover = rr connect = prod-zookeeper:2182 connect = prod-zookeeper-1:2182 connect = prod-zookeeper-2:2182 connect = prod-zookeeper-3:2182 connect = prod-zookeeper-4:2182 connect = prod-zookeeper-5:2182

Essentially the first host is a load balancer, and the next 5 are the actual zookeeper hosts so that we can bypass the ELB if its giving us fits. Now what we're seeing is that almost every connection ends up on prod-zookeeper-5. Over and over and over again, our hosts pick the same system each time. We're running Stunnel 4.52:

...
Clients allowed=8000 stunnel 4.52 on i486-pc-linux-gnu platform Compiled/running with OpenSSL 0.9.8k 25 Mar 2009 Threading:PTHREAD SSL:ENGINE Auth:LIBWRAP Sockets:POLL,IPv6

Any ideas what might be wrong here? Obviously we want the connections to be *roughly* random across the list of hosts... and if one of the hosts goes down, and the connection fails, we want the stunnel service to try again, and randomly pick a new host. It doesn't really seem to be doing that though.

--Matt

_______________________________________________ stunnel-users mailing list stunnel-users@stunnel.org <javascript:_e({}, 'cvml', 'stunnel-users@stunnel.org');> https://www.stunnel.org/cgi-bin/mailman/listinfo/stunnel-users

Michal Trojnara

7:39 p.m.

Hi Matt, Load balancing is incompatible with delayed resolver. Remove "delay = yes" from your configuration file. Mike On 2013-01-07 18:38, Matt Wise wrote:

...

I've got dozens of clients connecting with Stunnel to a group of 5 servers. Each system has a config that looks like this:

...
cert = /etc/stunnel/zookeeper.pem key = /etc/stunnel/zookeeper.key CAfile = /etc/stunnel/zookeeper_ca.pem verify = 2 delay = yes sslVersion = TLSv1 client = yes setuid = stunnel4 setgid = stunnel4 pid = /var/lib/stunnel4/zookeeper.stunnel4.pid socket = l:TCP_NODELAY=1 socket = r:TCP_NODELAY=1 TIMEOUTconnect = 2 session = 86400 debug = 5 [zookeeper] accept = 127.0.0.1:2182 failover = rr connect = prod-zookeeper:2182 connect = prod-zookeeper-1:2182 connect = prod-zookeeper-2:2182 connect = prod-zookeeper-3:2182 connect = prod-zookeeper-4:2182 connect = prod-zookeeper-5:2182

Essentially the first host is a load balancer, and the next 5 are the actual zookeeper hosts so that we can bypass the ELB if its giving us fits. Now what we're seeing is that almost every connection ends up on prod-zookeeper-5. Over and over and over again, our hosts pick the same system each time. We're running Stunnel 4.52:

...
Clients allowed=8000 stunnel 4.52 on i486-pc-linux-gnu platform Compiled/running with OpenSSL 0.9.8k 25 Mar 2009 Threading:PTHREAD SSL:ENGINE Auth:LIBWRAP Sockets:POLL,IPv6

Any ideas what might be wrong here? Obviously we want the connections to be *roughly* random across the list of hosts... and if one of the hosts goes down, and the connection fails, we want the stunnel service to try again, and randomly pick a new host. It doesn't really seem to be doing that though.

--Matt

_______________________________________________ stunnel-users mailing list stunnel-users@stunnel.org https://www.stunnel.org/cgi-bin/mailman/listinfo/stunnel-users

Matt Wise

8:39 p.m.

Ah. Thats it! I also see a fix in 4.54, am I right?

...

"delay = yes" fixed to work even if specified *after* "connect" option. Multiple "connect" targets fixed to also work with delayed resolver. --Matt

On Jan 7, 2013, at 11:39 AM, Michal Trojnara <Michal.Trojnara@mirt.net> wrote:

...

Hi Matt,

Load balancing is incompatible with delayed resolver. Remove "delay = yes" from your configuration file.

Mike

On 2013-01-07 18:38, Matt Wise wrote:

...
I've got dozens of clients connecting with Stunnel to a group of 5 servers. Each system has a config that looks like this:

...
cert = /etc/stunnel/zookeeper.pem key = /etc/stunnel/zookeeper.key CAfile = /etc/stunnel/zookeeper_ca.pem verify = 2 delay = yes sslVersion = TLSv1 client = yes setuid = stunnel4 setgid = stunnel4 pid = /var/lib/stunnel4/zookeeper.stunnel4.pid socket = l:TCP_NODELAY=1 socket = r:TCP_NODELAY=1 TIMEOUTconnect = 2 session = 86400 debug = 5 [zookeeper] accept = 127.0.0.1:2182 failover = rr connect = prod-zookeeper:2182 connect = prod-zookeeper-1:2182 connect = prod-zookeeper-2:2182 connect = prod-zookeeper-3:2182 connect = prod-zookeeper-4:2182 connect = prod-zookeeper-5:2182

Essentially the first host is a load balancer, and the next 5 are the actual zookeeper hosts so that we can bypass the ELB if its giving us fits. Now what we're seeing is that almost every connection ends up on prod-zookeeper-5. Over and over and over again, our hosts pick the same system each time. We're running Stunnel 4.52:

...
Clients allowed=8000 stunnel 4.52 on i486-pc-linux-gnu platform Compiled/running with OpenSSL 0.9.8k 25 Mar 2009 Threading:PTHREAD SSL:ENGINE Auth:LIBWRAP Sockets:POLL,IPv6

Any ideas what might be wrong here? Obviously we want the connections to be *roughly* random across the list of hosts... and if one of the hosts goes down, and the connection fails, we want the stunnel service to try again, and randomly pick a new host. It doesn't really seem to be doing that though.

--Matt

_______________________________________________ stunnel-users mailing list stunnel-users@stunnel.org https://www.stunnel.org/cgi-bin/mailman/listinfo/stunnel-users

_______________________________________________ stunnel-users mailing list stunnel-users@stunnel.org https://www.stunnel.org/cgi-bin/mailman/listinfo/stunnel-users

Matt Wise

9:13 p.m.

Following up... I have confirmed that in 4.54, delay=yes works fine with multiple connect= options. --Matt On Jan 7, 2013, at 12:39 PM, Matt Wise <matt@nextdoor.com> wrote:

...

Ah. Thats it! I also see a fix in 4.54, am I right?

...
"delay = yes" fixed to work even if specified *after* "connect" option. Multiple "connect" targets fixed to also work with delayed resolver. --Matt

On Jan 7, 2013, at 11:39 AM, Michal Trojnara <Michal.Trojnara@mirt.net> wrote:

...
Hi Matt,

Load balancing is incompatible with delayed resolver. Remove "delay = yes" from your configuration file.

Mike

On 2013-01-07 18:38, Matt Wise wrote:

...
I've got dozens of clients connecting with Stunnel to a group of 5 servers. Each system has a config that looks like this:

...
cert = /etc/stunnel/zookeeper.pem key = /etc/stunnel/zookeeper.key CAfile = /etc/stunnel/zookeeper_ca.pem verify = 2 delay = yes sslVersion = TLSv1 client = yes setuid = stunnel4 setgid = stunnel4 pid = /var/lib/stunnel4/zookeeper.stunnel4.pid socket = l:TCP_NODELAY=1 socket = r:TCP_NODELAY=1 TIMEOUTconnect = 2 session = 86400 debug = 5 [zookeeper] accept = 127.0.0.1:2182 failover = rr connect = prod-zookeeper:2182 connect = prod-zookeeper-1:2182 connect = prod-zookeeper-2:2182 connect = prod-zookeeper-3:2182 connect = prod-zookeeper-4:2182 connect = prod-zookeeper-5:2182

Essentially the first host is a load balancer, and the next 5 are the actual zookeeper hosts so that we can bypass the ELB if its giving us fits. Now what we're seeing is that almost every connection ends up on prod-zookeeper-5. Over and over and over again, our hosts pick the same system each time. We're running Stunnel 4.52:

...
Clients allowed=8000 stunnel 4.52 on i486-pc-linux-gnu platform Compiled/running with OpenSSL 0.9.8k 25 Mar 2009 Threading:PTHREAD SSL:ENGINE Auth:LIBWRAP Sockets:POLL,IPv6

Any ideas what might be wrong here? Obviously we want the connections to be *roughly* random across the list of hosts... and if one of the hosts goes down, and the connection fails, we want the stunnel service to try again, and randomly pick a new host. It doesn't really seem to be doing that though.

--Matt

_______________________________________________ stunnel-users mailing list stunnel-users@stunnel.org https://www.stunnel.org/cgi-bin/mailman/listinfo/stunnel-users

_______________________________________________ stunnel-users mailing list stunnel-users@stunnel.org https://www.stunnel.org/cgi-bin/mailman/listinfo/stunnel-users

Matt Wise

5 Mar 5 Mar

11:08 p.m.

This is odd, but we're seeing failures again with this config. We're seeing that with delay=yes, and 6 total connection targets (5 servers, 1 ELB), stunnel picks up the first connection (the ELB) and never uses any of the other targets. Ever. If we use tcpkill to block access to the ELB, we end up completely screwed. Anyone else seeing this as a problem still? On Jan 7, 2013, at 12:39 PM, Matt Wise <matt@nextdoor.com> wrote:

...

Ah. Thats it! I also see a fix in 4.54, am I right?

...
"delay = yes" fixed to work even if specified *after* "connect" option. Multiple "connect" targets fixed to also work with delayed resolver. --Matt

On Jan 7, 2013, at 11:39 AM, Michal Trojnara <Michal.Trojnara@mirt.net> wrote:

...
Hi Matt,

Load balancing is incompatible with delayed resolver. Remove "delay = yes" from your configuration file.

Mike

On 2013-01-07 18:38, Matt Wise wrote:

...
I've got dozens of clients connecting with Stunnel to a group of 5 servers. Each system has a config that looks like this:

...
cert = /etc/stunnel/zookeeper.pem key = /etc/stunnel/zookeeper.key CAfile = /etc/stunnel/zookeeper_ca.pem verify = 2 delay = yes sslVersion = TLSv1 client = yes setuid = stunnel4 setgid = stunnel4 pid = /var/lib/stunnel4/zookeeper.stunnel4.pid socket = l:TCP_NODELAY=1 socket = r:TCP_NODELAY=1 TIMEOUTconnect = 2 session = 86400 debug = 5 [zookeeper] accept = 127.0.0.1:2182 failover = rr connect = prod-zookeeper:2182 connect = prod-zookeeper-1:2182 connect = prod-zookeeper-2:2182 connect = prod-zookeeper-3:2182 connect = prod-zookeeper-4:2182 connect = prod-zookeeper-5:2182

Essentially the first host is a load balancer, and the next 5 are the actual zookeeper hosts so that we can bypass the ELB if its giving us fits. Now what we're seeing is that almost every connection ends up on prod-zookeeper-5. Over and over and over again, our hosts pick the same system each time. We're running Stunnel 4.52:

...
Clients allowed=8000 stunnel 4.52 on i486-pc-linux-gnu platform Compiled/running with OpenSSL 0.9.8k 25 Mar 2009 Threading:PTHREAD SSL:ENGINE Auth:LIBWRAP Sockets:POLL,IPv6

Any ideas what might be wrong here? Obviously we want the connections to be *roughly* random across the list of hosts... and if one of the hosts goes down, and the connection fails, we want the stunnel service to try again, and randomly pick a new host. It doesn't really seem to be doing that though.

--Matt

_______________________________________________ stunnel-users mailing list stunnel-users@stunnel.org https://www.stunnel.org/cgi-bin/mailman/listinfo/stunnel-users

_______________________________________________ stunnel-users mailing list stunnel-users@stunnel.org https://www.stunnel.org/cgi-bin/mailman/listinfo/stunnel-users

Michal Trojnara

1 May 1 May

4:40 p.m.

Hi, I have updated the manual (http://www.stunnel.org/static/stunnel.html): **delay* = yes | no* delay DNS lookup for /connect/ option This option is useful for dynamic DNS, or when DNS is not available during *stunnel* startup (road warrior VPN, dial-up configurations). Delayed resolver mode is automatically engaged when stunnel fails to resolve on startup any of the /connect/ targets for a service. Delayed resolver inflicts /failover = prio/. default: no Mike On 2013-03-06 00:08, Matt Wise wrote:

...

This is odd, but we're seeing failures again with this config. We're seeing that with delay=yes, and 6 total connection targets (5 servers, 1 ELB), stunnel picks up the first connection (the ELB) and never uses any of the other targets. Ever. If we use tcpkill to block access to the ELB, we end up completely screwed.

Anyone else seeing this as a problem still?

On Jan 7, 2013, at 12:39 PM, Matt Wise <matt@nextdoor.com <mailto:matt@nextdoor.com>> wrote:

...
Ah. Thats it! I also see a fix in 4.54, am I right?

...
* "delay = yes" fixed to work even if specified *after* "connect" option. * Multiple "connect" targets fixed to also work with delayed resolver.

--Matt

On Jan 7, 2013, at 11:39 AM, Michal Trojnara <Michal.Trojnara@mirt.net <mailto:Michal.Trojnara@mirt.net>> wrote:

...
Hi Matt,

Load balancing is incompatible with delayed resolver. Remove "delay = yes" from your configuration file.

Mike

On 2013-01-07 18:38, Matt Wise wrote:

...
I've got dozens of clients connecting with Stunnel to a group of 5 servers. Each system has a config that looks like this:

...
cert = /etc/stunnel/zookeeper.pem key = /etc/stunnel/zookeeper.key CAfile = /etc/stunnel/zookeeper_ca.pem verify = 2 delay = yes sslVersion = TLSv1 client = yes setuid = stunnel4 setgid = stunnel4 pid = /var/lib/stunnel4/zookeeper.stunnel4.pid socket = l:TCP_NODELAY=1 socket = r:TCP_NODELAY=1 TIMEOUTconnect = 2 session = 86400 debug = 5 [zookeeper] accept = 127.0.0.1:2182 failover = rr connect = prod-zookeeper:2182 connect = prod-zookeeper-1:2182 connect = prod-zookeeper-2:2182 connect = prod-zookeeper-3:2182 connect = prod-zookeeper-4:2182 connect = prod-zookeeper-5:2182

Essentially the first host is a load balancer, and the next 5 are the actual zookeeper hosts so that we can bypass the ELB if its giving us fits. Now what we're seeing is that almost every connection ends up on prod-zookeeper-5. Over and over and over again, our hosts pick the same system each time. We're running Stunnel 4.52:

...
Clients allowed=8000 stunnel 4.52 on i486-pc-linux-gnu platform Compiled/running with OpenSSL 0.9.8k 25 Mar 2009 Threading:PTHREAD SSL:ENGINE Auth:LIBWRAP Sockets:POLL,IPv6

Any ideas what might be wrong here? Obviously we want the connections to be *roughly* random across the list of hosts... and if one of the hosts goes down, and the connection fails, we want the stunnel service to try again, and randomly pick a new host. It doesn't really seem to be doing that though.

--Matt

_______________________________________________ stunnel-users mailing list stunnel-users@stunnel.org <mailto:stunnel-users@stunnel.org> https://www.stunnel.org/cgi-bin/mailman/listinfo/stunnel-users

_______________________________________________ stunnel-users mailing list stunnel-users@stunnel.org <mailto:stunnel-users@stunnel.org> https://www.stunnel.org/cgi-bin/mailman/listinfo/stunnel-users

4707

Age (days ago)

4821

Last active (days ago)

List overview

Download

8 comments

3 participants

participants (3)

Brian Wilkins
Matt Wise
Michal Trojnara