Hello stunnel-users,
I've recently setup stunnel in front of haproxy to handle SSL connections. While it worked fine in testing, my simplistic benchmarks showed it unable to do more than 9 req/s, and as we turned some production traffic on, it appears that threads and connections quickly pile up and eventually timing out. I assume I've made some error in configuration, or there is something odd with my setup, as it appears stunnel has been benchmarked and performs well beyond what we're attempting here.
I've read the FAQ and as much from the mailing list archives as I thought applicable, and I was unable to see any red flags or errors similar to what I'm seeing.
If you don't mind, I'd like some help debugging the issue or pointers as to what else I could do to debug the issue. I've included details about my compile and configuration as well as the edited startup section from the debug (7) log.
I compiled stunnel 4.44 with ./configure --disable-ipv6 --disable-libwrap --disable-fips. It's running on a different box than where it was compiled, but both machines should be identical.
openssl version -a OpenSSL 0.9.8e-fips-rhel5 01 Jul 2008 built on: Thu Sep 3 14:22:06 EDT 2009 platform: linux-x86_64 options: bn(64,64) md2(int) rc4(ptr,int) des(idx,cisc,16,int) blowfish(ptr2) compiler: gcc -fPIC -DOPENSSL_PIC -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DKRB5_MIT -I/usr/kerberos/include -DL_ENDIAN -DTERMIO -WallM OPENSSLDIR: "/etc/pki/tls" engines: dynamic
stunnel.4.44 -version stunnel 4.44 on x86_64-unknown-linux-gnu platform Compiled/running with OpenSSL 0.9.8e-fips-rhel5 01 Jul 2008 Threading:PTHREAD SSL:ENGINE Auth:none Sockets:POLL,IPv4
Global options: debug = daemon.notice pid = /usr/local/var/run/stunnel/stunnel.pid RNDbytes = 64 RNDfile = /dev/urandom RNDoverwrite = yes
Service-level options: ciphers = ALL:!SSLv2:!aNULL:!EXP:!LOW:-MEDIUM:RC4:+HIGH curve = prime256v1 session = 300 seconds sslVersion = TLSv1 for client, all for server stack = 65536 bytes TIMEOUTbusy = 300 seconds TIMEOUTclose = 60 seconds TIMEOUTconnect = 10 seconds TIMEOUTidle = 43200 seconds verify = none
Here's my configuration file:
sslVersion = all ciphers = RC4-MD5:HIGH:!aNULL:!SSLv2 options = NO_SSLv2 setuid = root setgid = haproxy pid = /var/run/stunnel.pid #socket = l:TCP_NODELAY=1 #socket = r:TCP_NODELAY=1 output = /var/log/stunnel.log debug = 7
[XXX] CAfile = /etc/stunnel/ca_chain.crt cert = /etc/stunnel/XXX.crt key = /etc/stunnel/XXX.key accept = 443 connect = 127.0.0.1:81 #xforwardedfor = yes #TIMEOUTclose = 5 #TIMEOUTbusy = 2 #TIMEOUTconnect = 2 #TIMEOUTidle = 2
(I was previously running 4.32 with the xforwardedfor patch, but to eliminate potential issues I've upgraded to the latest version, running stock without the patch, hence the commented out xforwardedfor option)
here's the startup:
2011.10.01 12:18:38 LOG7[21868:47602610340864]: No limit detected for the number of clients 2011.10.01 12:18:38 LOG7[21868:47602610340864]: signal_pipe: FD=3 allocated (non-blocking mode) 2011.10.01 12:18:38 LOG7[21868:47602610340864]: signal_pipe: FD=4 allocated (non-blocking mode) 2011.10.01 12:18:38 LOG5[21868:47602610340864]: stunnel 4.44 on x86_64-unknown-linux-gnu platform 2011.10.01 12:18:38 LOG5[21868:47602610340864]: Compiled/running with OpenSSL 0.9.8e-fips-rhel5 01 Jul 2008 2011.10.01 12:18:38 LOG5[21868:47602610340864]: Threading:PTHREAD SSL:ENGINE Auth:none Sockets:POLL,IPv4 2011.10.01 12:18:38 LOG5[21868:47602610340864]: Reading configuration from file /etc/stunnel/stunnel.conf 2011.10.01 12:18:38 LOG7[21868:47602610340864]: PRNG seeded successfully 2011.10.01 12:18:38 LOG6[21868:47602610340864]: Initializing SSL context for service XXX 2011.10.01 12:18:38 LOG4[21868:47602610340864]: Insecure file permissions on /etc/stunnel/XXX.key 2011.10.01 12:18:38 LOG7[21868:47602610340864]: Certificate: /etc/stunnel/XXX.crt 2011.10.01 12:18:38 LOG7[21868:47602610340864]: Certificate loaded 2011.10.01 12:18:38 LOG7[21868:47602610340864]: Key file: /etc/stunnel/XXX.key 2011.10.01 12:18:38 LOG7[21868:47602610340864]: Private key loaded 2011.10.01 12:18:38 LOG7[21868:47602610340864]: Could not load DH parameters from /etc/stunnel/XXX.crt 2011.10.01 12:18:38 LOG7[21868:47602610340864]: Using hardcoded DH parameters 2011.10.01 12:18:38 LOG7[21868:47602610340864]: DH initialized with 2048-bit key 2011.10.01 12:18:38 LOG7[21868:47602610340864]: SSL options set: 0x01000000 2011.10.01 12:18:38 LOG6[21868:47602610340864]: SSL context initialized 2011.10.01 12:18:38 LOG5[21868:47602610340864]: Configuration successful
Later, I see logs more or less like this:
2011.10.01 12:41:44 LOG7[29902:1108293952]: str_stats: 0 block(s), 0 data byte(s), 0 control byte(s) 2011.10.01 12:41:43 LOG7[29902:1116580160]: Service XXX started 2011.10.01 12:41:44 LOG7[29902:1116580160]: Option TCP_NODELAY set on local socket 2011.10.01 12:41:44 LOG5[29902:1116580160]: Service XXX accepted connection from YYY1:50059 2011.10.01 12:41:44 LOG7[29902:1108363584]: SSL state (accept): SSLv3 write server done A 2011.10.01 12:41:43 LOG7[29902:47056920704000]: local socket: FD=184 allocated (non-blocking mode) 2011.10.01 12:41:44 LOG7[29902:1108363584]: SSL state (accept): SSLv3 flush data 2011.10.01 12:41:44 LOG7[29902:47056920704000]: Service XXX accepted FD=184 from YYY2:58131 2011.10.01 12:41:44 LOG3[29902:1108363584]: SSL_accept: Peer suddenly disconnected 2011.10.01 12:41:44 LOG7[29902:1108433216]: SSL state (accept): before/accept initialization 2011.10.01 12:41:44 LOG7[29902:47056920704000]: local socket: FD=11 allocated (non-blocking mode) 2011.10.01 12:41:44 LOG7[29902:47056920704000]: Service XXX accepted FD=11 from YYY3:49309 2011.10.01 12:41:44 LOG5[29902:1108363584]: Connection reset: 0 bytes sent to SSL, 0 bytes sent to socket 2011.10.01 12:41:44 LOG7[29902:1108363584]: Service XXX finished (173 left)
I see the thread count steadily increase:
# ps -eLf |grep stun |wc -l 513 later: # ps -eLf |grep stun |wc -l 1020
(the FD ulimit for this proces is only 1024 currently, but that does not appear to be the limiting factor yet)
It's almost like stunnel is only processing one connection at a time to the local destination port, which causes stunnel to build up a pile of waiting connections and eventually they start timing out. I've verified with nc and some local tests that the 127.0.0.1:81 port is picking up connections even under load, so I don't believe the issue is on the haproxy side. The firewall is set to allow all localhost connections and the 443 ports are configured the same as the port 80 ports which do hundreds of reqs a second, further ruling out firewall issues.
Is there any setting I'm missing that would or wouldn't allow stunnel to process connect calls in parallel? Is there any additional debug information I can provide?
Thank you for your time.
Lamont Lucas wrote:
While it worked fine in testing, my simplistic benchmarks showed it unable to do more than 9 req/s, and as we turned some production traffic on, it appears that threads and connections quickly pile up and eventually timing out. I assume I've made some error in configuration, or there is something odd with my setup, as it appears stunnel has been benchmarked and performs well beyond what we're attempting here.
[cut]
openssl version -a OpenSSL 0.9.8e-fips-rhel5 01 Jul 2008
Upgrade your OpenSSL to 1.0.0b or later. The workaround for http://www.securityfocus.com/bid/44884/discuss is very slow...
Mike