Hello!
Recently we started seeing some erroneous behavior in one of our load balancer configurations in which we're using stunnel for SSL tunnel termination with web clients. Specifically, the setup looks like the following:
Web client ---(SSL)---> stunnel ---(unix socket)--> haproxy ---(tcp/http)---> web server
After extensive troubleshooting we were able to determine that when the following parameters are present:
* Client forces non-keepalive connection * Client generates a POST request, the response to which (generated by the back-end web server) is greater than about 8k * stunnel and haproxy communicate via unix socket
...then most responses end up being truncated by an extraneous TCP RST packet from stunnel, and the following log lines are generated:
Jan 23 15:24:17 ds420a stunnel: LOG3[22586:139835614443264]: transfer() loop executes not transferring any data Jan 23 15:24:17 ds420a stunnel: LOG3[22586:139835614443264]: please report the problem to Michal.Trojnara@mirt.net Jan 23 15:24:17 ds420a stunnel: LOG3[22586:139835614443264]: stunnel 4.54 on x86_64-unknown-linux-gnu platform Jan 23 15:24:17 ds420a stunnel: LOG3[22586:139835614443264]: Compiled/running with OpenSSL 1.0.0-fips 29 Mar 2010 Jan 23 15:24:17 ds420a stunnel: LOG3[22586:139835614443264]: Threading:PTHREAD SSL:+ENGINE+OCSP+FIPS Auth:none Sockets:POLL+IPv6 Jan 23 15:24:17 ds420a stunnel: LOG3[22586:139835614443264]: protocol=TLSv1, SSL_pending=0 Jan 23 15:24:17 ds420a stunnel: LOG3[22586:139835614443264]: sock_open_rd=n, sock_open_wr=Y Jan 23 15:24:17 ds420a stunnel: LOG3[22586:139835614443264]: SSL_RECEIVED_SHUTDOWN=n, SSL_SENT_SHUTDOWN=Y Jan 23 15:24:17 ds420a stunnel: LOG3[22586:139835614443264]: sock_can_rd=Y, sock_can_wr=n Jan 23 15:24:17 ds420a stunnel: LOG3[22586:139835614443264]: ssl_can_rd=n, ssl_can_wr=n Jan 23 15:24:17 ds420a stunnel: LOG3[22586:139835614443264]: read_wants_read=Y, read_wants_write=n Jan 23 15:24:17 ds420a stunnel: LOG3[22586:139835614443264]: write_wants_read=n, write_wants_write=n Jan 23 15:24:17 ds420a stunnel: LOG3[22586:139835614443264]: shutdown_wants_read=n, shutdown_wants_write=n Jan 23 15:24:17 ds420a stunnel: LOG3[22586:139835614443264]: socket input buffer: 0 byte(s), ssl input buffer: 0 byte(s)
The work arounds we have found are either:
1. Make stunnel and haproxy communicate over a TCP socket bound to the loopback interface (less ideal, as then we're limited to 65k connections) 2. Add the "reset=no" parameter to our stunnel.conf (which seems to fix the problem, but makes me wonder if we're not opening ourselves up to other problems-- like really getting stuck in an infinite loop that that watchdog is intended to avoid.)
In any case based on what "reset=no" is supposed to do as well as where that first line of the error output above is being generated in the code, it's clear that the main client data transfer loop in client.c (starting line 519) is being called too often with no data to transfer, which triggers the infinite loop detection in the transfer() function (client.c, starting line 832).
My best hypothesis at this point is that the transfer() code was written only with TCP sockets in mind, and that something about the unix socket in use here is triggering that loop too often. Maybe s_poll_wait on line 541 is being called with parameters which are right for a TCP socket but inappropriate for a unix socket? At this point though, we're at the limit of my socket programming ability, so I'm a little stuck. (Would like to have simply fixed this and submitted a patch... but cest la vie, eh.)
We'd like to avoid using a TCP socket to communicate between stunnel and haproxy simply because it's a little less efficient, and because there's an artificial 65k limit on the number of back-end connections we can support per loopback IP if we do that. Alternatively, if someone could verify that "reset=no" is an acceptable / safe work around in this case, I'd appreciate that (though, of course, I'd prefer my logs not to get spammed on every connection. :P )
Is there any other data I can provide to help troubleshoot and fix this?
Thanks, Stephen
-- Stephen Balukoff Blue Box Group, LLC (800)613-4305 x807