Hello stunnel-users,
I've recently setup stunnel in front of haproxy to handle SSL
connections. While it worked fine in testing, my simplistic
benchmarks showed it unable to do more than 9 req/s, and as we turned
some production traffic on, it appears that threads and connections
quickly pile up and eventually timing out. I assume I've made some
error in configuration, or there is something odd with my setup, as it
appears stunnel has been benchmarked and performs well beyond what
we're attempting here.
I've read the FAQ and as much from the mailing list archives as I
thought applicable, and I was unable to see any red flags or errors
similar to what I'm seeing.
If you don't mind, I'd like some help debugging the issue or pointers
as to what else I could do to debug the issue. I've included details
about my compile and configuration as well as the edited startup
section from the debug (7) log.
I compiled stunnel 4.44 with ./configure --disable-ipv6
--disable-libwrap --disable-fips. It's running on a different box
than where it was compiled, but both machines should be identical.
openssl version -a
OpenSSL 0.9.8e-fips-rhel5 01 Jul 2008
built on: Thu Sep 3 14:22:06 EDT 2009
platform: linux-x86_64
options: bn(64,64) md2(int) rc4(ptr,int) des(idx,cisc,16,int) blowfish(ptr2)
compiler: gcc -fPIC -DOPENSSL_PIC -DZLIB -DOPENSSL_THREADS
-D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DKRB5_MIT
-I/usr/kerberos/include -DL_ENDIAN -DTERMIO -WallM
OPENSSLDIR: "/etc/pki/tls"
engines: dynamic
stunnel.4.44 -version
stunnel 4.44 on x86_64-unknown-linux-gnu platform
Compiled/running with OpenSSL 0.9.8e-fips-rhel5 01 Jul 2008
Threading:PTHREAD SSL:ENGINE Auth:none Sockets:POLL,IPv4
Global options:
debug = daemon.notice
pid = /usr/local/var/run/stunnel/stunnel.pid
RNDbytes = 64
RNDfile = /dev/urandom
RNDoverwrite = yes
Service-level options:
ciphers = ALL:!SSLv2:!aNULL:!EXP:!LOW:-MEDIUM:RC4:+HIGH
curve = prime256v1
session = 300 seconds
sslVersion = TLSv1 for client, all for server
stack = 65536 bytes
TIMEOUTbusy = 300 seconds
TIMEOUTclose = 60 seconds
TIMEOUTconnect = 10 seconds
TIMEOUTidle = 43200 seconds
verify = none
Here's my configuration file:
sslVersion = all
ciphers = RC4-MD5:HIGH:!aNULL:!SSLv2
options = NO_SSLv2
setuid = root
setgid = haproxy
pid = /var/run/stunnel.pid
#socket = l:TCP_NODELAY=1
#socket = r:TCP_NODELAY=1
output = /var/log/stunnel.log
debug = 7
[XXX]
CAfile = /etc/stunnel/ca_chain.crt
cert = /etc/stunnel/XXX.crt
key = /etc/stunnel/XXX.key
accept = 443
connect = 127.0.0.1:81
#xforwardedfor = yes
#TIMEOUTclose = 5
#TIMEOUTbusy = 2
#TIMEOUTconnect = 2
#TIMEOUTidle = 2
(I was previously running 4.32 with the xforwardedfor patch, but to
eliminate potential issues I've upgraded to the latest version,
running stock without the patch, hence the commented out xforwardedfor
option)
here's the startup:
2011.10.01 12:18:38 LOG7[21868:47602610340864]: No limit detected for
the number of clients
2011.10.01 12:18:38 LOG7[21868:47602610340864]: signal_pipe: FD=3
allocated (non-blocking mode)
2011.10.01 12:18:38 LOG7[21868:47602610340864]: signal_pipe: FD=4
allocated (non-blocking mode)
2011.10.01 12:18:38 LOG5[21868:47602610340864]: stunnel 4.44 on
x86_64-unknown-linux-gnu platform
2011.10.01 12:18:38 LOG5[21868:47602610340864]: Compiled/running with
OpenSSL 0.9.8e-fips-rhel5 01 Jul 2008
2011.10.01 12:18:38 LOG5[21868:47602610340864]: Threading:PTHREAD
SSL:ENGINE Auth:none Sockets:POLL,IPv4
2011.10.01 12:18:38 LOG5[21868:47602610340864]: Reading configuration
from file /etc/stunnel/stunnel.conf
2011.10.01 12:18:38 LOG7[21868:47602610340864]: PRNG seeded successfully
2011.10.01 12:18:38 LOG6[21868:47602610340864]: Initializing SSL
context for service XXX
2011.10.01 12:18:38 LOG4[21868:47602610340864]: Insecure file
permissions on /etc/stunnel/XXX.key
2011.10.01 12:18:38 LOG7[21868:47602610340864]: Certificate:
/etc/stunnel/XXX.crt
2011.10.01 12:18:38 LOG7[21868:47602610340864]: Certificate loaded
2011.10.01 12:18:38 LOG7[21868:47602610340864]: Key file: /etc/stunnel/XXX.key
2011.10.01 12:18:38 LOG7[21868:47602610340864]: Private key loaded
2011.10.01 12:18:38 LOG7[21868:47602610340864]: Could not load DH
parameters from /etc/stunnel/XXX.crt
2011.10.01 12:18:38 LOG7[21868:47602610340864]: Using hardcoded DH parameters
2011.10.01 12:18:38 LOG7[21868:47602610340864]: DH initialized with 2048-bit key
2011.10.01 12:18:38 LOG7[21868:47602610340864]: SSL options set: 0x01000000
2011.10.01 12:18:38 LOG6[21868:47602610340864]: SSL context initialized
2011.10.01 12:18:38 LOG5[21868:47602610340864]: Configuration successful
Later, I see logs more or less like this:
2011.10.01 12:41:44 LOG7[29902:1108293952]: str_stats: 0 block(s), 0
data byte(s), 0 control byte(s)
2011.10.01 12:41:43 LOG7[29902:1116580160]: Service XXX started
2011.10.01 12:41:44 LOG7[29902:1116580160]: Option TCP_NODELAY set on
local socket
2011.10.01 12:41:44 LOG5[29902:1116580160]: Service XXX accepted
connection from YYY1:50059
2011.10.01 12:41:44 LOG7[29902:1108363584]: SSL state (accept): SSLv3
write server done A
2011.10.01 12:41:43 LOG7[29902:47056920704000]: local socket: FD=184
allocated (non-blocking mode)
2011.10.01 12:41:44 LOG7[29902:1108363584]: SSL state (accept): SSLv3 flush data
2011.10.01 12:41:44 LOG7[29902:47056920704000]: Service XXX accepted
FD=184 from YYY2:58131
2011.10.01 12:41:44 LOG3[29902:1108363584]: SSL_accept: Peer suddenly
disconnected
2011.10.01 12:41:44 LOG7[29902:1108433216]: SSL state (accept):
before/accept initialization
2011.10.01 12:41:44 LOG7[29902:47056920704000]: local socket: FD=11
allocated (non-blocking mode)
2011.10.01 12:41:44 LOG7[29902:47056920704000]: Service XXX accepted
FD=11 from YYY3:49309
2011.10.01 12:41:44 LOG5[29902:1108363584]: Connection reset: 0 bytes
sent to SSL, 0 bytes sent to socket
2011.10.01 12:41:44 LOG7[29902:1108363584]: Service XXX finished (173 left)
I see the thread count steadily increase:
# ps -eLf |grep stun |wc -l
513
later:
# ps -eLf |grep stun |wc -l
1020
(the FD ulimit for this proces is only 1024 currently, but that does
not appear to be the limiting factor yet)
It's almost like stunnel is only processing one connection at a time
to the local destination port, which causes stunnel to build up a pile
of waiting connections and eventually they start timing out. I've
verified with nc and some local tests that the 127.0.0.1:81 port is
picking up connections even under load, so I don't believe the issue
is on the haproxy side. The firewall is set to allow all localhost
connections and the 443 ports are configured the same as the port 80
ports which do hundreds of reqs a second, further ruling out firewall
issues.
Is there any setting I'm missing that would or wouldn't allow stunnel
to process connect calls in parallel? Is there any additional debug
information I can provide?
Thank you for your time.