[stunnel-users] stunnel randomly crashing

Thu Feb 2 15:10:38 CET 2017

Hi All,

Let me first get the formalities out of the way:
stunnel 5.40 on x86_64-unknown-linux-gnu platform
Compiled/running with OpenSSL 1.0.1f 6 Jan 2014
Threading:PTHREAD Sockets:POLL,IPv6 TLS:ENGINE,FIPS,OCSP,PSK,SNI

Compiled the latest version of stunnel today, and yes, I know I am running an old/insecure version of OpenSSL

stunnel.cnf
debug		= debug
pid		= /var/run/stunnel.pid
socket		= l:TCP_NODELAY=1
socket		= r:TCP_NODELAY=1
ciphers		= ALL
options		= NO_SSLv2
fips		= no

[my.service]
accept		= *:501
CAfile		= /etc/stunnel/my.service.ca
cert		= /etc/stunnel/my.service.pem
exec		= /path/to/my/server
TIMEOUTclose	= 0

Here’s what been happening.  I’ve been running stunnel for almost 4 or 5 years now with the above service, BUT, I was always running under xinetd.  I’ve never had one single issue, stunnel served me well, and I was rather happy.  Lately however, the amount of connections (and I guess more importantly the rate of incoming connections) has been increasing steadily on the server.  After lots of debugging, it was determined that this was due to stunnel processes constantly being fired up under xinetd.  By a lot, I am talking about 20-30+ connections/sec.

So today, I took the time and changed our entire cluster of 17 servers….  All servers was upgraded to the latest version (we were on 5.24 previously), and instead of using xinetd I have amended the configurations so that stunnel now runs in daemon mode (runs under root).  For the most of it, it works absolutely fine.  As you can see the configuration is in maximum debugging mode, and I am thus taking as much info as I can out of the logs.  The logs show absolutely nothing as to what is happening and why ☹ I’m more than happy to provide the logs to someone to look at, but there’s THOUSANDS of connections and debug info – it’s large, very large.

After a seemingly random amount of time (from a few minutes, to a few hours), and after successfully accepting THOUSANDS of connections, stunnel would just die.  Nothing abnormal logged, no dmesg, no crash.  The process just dies.  Stunnel accepts by default 500 connections (which is a bit low), but I have also confirmed that it is not stunnel running out of file descriptors.  When it does run out, stunnel logs the appropriate connections refused messages, and continues to run (i.e. it does not crash – we’ve specifically tested that).

# cat /proc/7095/limits 
Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            8388608              unlimited            bytes     
Max core file size        0                    unlimited            bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             257585               257585               processes 
Max open files            1024                 4096                 files     
Max locked memory         65536                65536                bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       257585               257585               signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0                    
Max realtime timeout      unlimited            unlimited            us        

# ls /proc/7095/fd/ | wc -l
956

We are very far from the limits (from what I understand at least, seeing that we are running under root, hard limit?).

Except for the fact that the servers are very busy in terms of incoming connections/sec (although +- 20/sec surely can’t be that much?), is there anything else to possibly look at?  After moving from xinetd to daemon mode my load on the server dropped by > 60% - so the saving is significant and I don’t want to go back to xinetd mode if I can avoid it.  It also means that the machine isn’t under any significant strain anymore where load could be a factor affecting stunnel.

Thnx,
Chris.