Question about behavior with "failover = prio"

Hello again. We recently noticed a behavior in stunnel that surprised us, and seems to be inconsistent with the man page description. We use "failover = prio" in our configuration file. To refresh everyone's memory, the man page describes this option here: failover = rr | prio Failover strategy for multiple "connect" targets. rr (round robin) - fair load distribution prio (priority) - use the order specified in config file default: rr Our config file contains a list of connect targets, like this: [...] connect = h1.example.com connect = h2.example.com connect = h3.example.com connect = h4.example.com [...] Based on the description, we expect the connection to try h1 first, and if h1 fails, failover through the remainder of the list, in-order. However, what we observe is that the connections start at a seemingly random place in the list. It does appear to proceed in-order...it's just the starting place that seems incorrect. For example, we might get a sequence that proceeds "h3->h4->h1->h2". Please let me know if I am misunderstanding the intended behavior of the 'prio' failover strategy. Here is a trial patch that hard-codes the starting place for the failover to the beginning of the list. I am not sure if this change has any unintended consequences for other configurations. But as a proof-of-concept, it does seem to fix the behavior to be consistent with our reading of the man page. This patch is against 5.20 but it appears that 5.21b2 acts the same way. Thanks! Michael diff -ru stunnel-5.20.orig/src/client.c stunnel-5.20-prio-fix/src/client.c --- stunnel-5.20.orig/src/client.c 2015-07-16 15:55:52.213064746 -0700 +++ stunnel-5.20-prio-fix/src/client.c 2015-07-17 16:13:12.129184507 -0700 @@ -1285,7 +1285,8 @@ *c->connect_addr.rr_ptr=(i+1)%c->connect_addr.num; s_log(LOG_INFO, "failover: round-robin"); } else { - s_log(LOG_INFO, "failover: priority"); + s_log(LOG_INFO, "failover: priority, starting at first entry"); + i=0; } return i; }

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 21.07.2015 00:09, Michael Gebis wrote:
Please let me know if I am misunderstanding the intended behavior of the 'prio' failover strategy.
You understand it correctly. The "failover = prio" feature is indeed broken since stunnel 5.15. Your fix is also correct: the rr_val/rr_ptr values should not be used for the 'prio' strategy. Thank you for reporting the bug. It will be fixed in the next release. Mike -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJVr5bDAAoJEC78f/DUFuAUpNsQAMTM0ou0JlBV+/IJahxnXJTN sokSc3YMhUmk2dqbym2ZP1IeA9rCVK4kDQ4AB5ty1NNSBKhwLW3raROZIMXaXMnY YSl5XIU+5jpviQ4dlD0Us+zZnH4GiBhoDdJknIyRTqXvKBERgoF1vOwA0kIGUosi 7F0EVLnghoMSXvF1mHWH1s8+5EMF9LZXs3Un7k9nHga7pTwXshDTS+9TrkDpo90L BbMei0iddc4xB/Qi+31XOduLZxIbTb6z4XEeogc2EbbSiLbZ2Y0p1NVDzkdfX8rg xKwrNh71Zr8kOnmZuLMN7SJEQUw0lK/+felbmfkJ2sLh11+AWcg0aPD6kAI5ghtR Mfei5XYN5wEwvic8XFa99zWJpjC1Z55VAHrQzL1IMQj46e7qqvlplqQrzvjUuUnO E2FR1u1qSujmCwrNdgQtkdv3hCp2eXXmkLsIoX6Y83tBAeI2mpIreg14BWRHOJXk ql6Jf9QBnnE52iwRbuBx2Qk9wv1aQtzMt3+MFpFbcCufk+24W/uZFLfUVKv8xJB9 sqmjoXmb+yov3R3oVMo1+wNHm5LrTVgZ7wJ4XLLj5jq7PlWuBJRH24WQmtZc3Ta2 56dUpFU4MoEbJ49nvLlFdU1W/nZLLFGoxJiDqtttWTUfr50fHEUJbVl/2UHLIa+7 OYJNYCtaQrR/clIi2G9x =rZ36 -----END PGP SIGNATURE-----
participants (2)
-
Michael Gebis
-
Michal Trojnara