Hi,
We have a problem in our production system. Need help to isolate the cause. Want to know whether issue is with stunnel. We are not in situation to upgrade, since we are in production. Your help is appreciated.
Environment: 1) Client program(java in win desktops) connect to server (solaris) Stunnel port which is forwarded locally to an inetd program which connects to oracle database. ( 2 Tier Architecture )
2) Setup worked for almost 2 years without any issue, suddenly since last September, we have connection reset issue on app clients intermittently.
3) almost lots of clients ( in 100's ) get reset and we find 100's of below messages in /var/adm/messages file of solaris.
stunnel: [ID 821868 daemon.info] LOG6[21204:5272]: s_poll_wait timeout: connection reset stunnel: [ID 821868 daemon.notice] LOG5[21204:4804]: Connecti on reset: 1867341 bytes sent to SSL, 55211 bytes sent to socket
4) After reset, Client attempt to relogin works without any issue. there is no pattern of load, time, no of users.
5) all the database, inetd program, stunnel runs with oracle userid
6) No other OS related, resource related errors in /var/adm/messages.
7) At this point, i dont have stunnel debug log during the problem occurrence , since we have cron job that recycles logs. if required will send later.
8) Interms of change happend before the first occurance of issue, we did CPU/RAM upgrade, Oracle Database CPU Patch. But database team says no errors reported on oracle side during reset.
9) Max users on database is around 1024, which our max users is 300.
********************************************************************
Config: #/usr/local/bin/stunnel -version stunnel 4.25 on sparc-sun-solaris2.10 with OpenSSL 0.9.7d 17 Mar 2004 (+ security fixes for: CVE-2005-2969 CVE-2006-2937 CVE-2006-2940 CVE-2006-3738 CVE-2006-4339 CVE-2006-4343 CVE-2007-5135 CVE-2007-3108 CVE-2008-5077 CVE-2009-0590 CVE-2009-3555) Threading:PTHREAD SSL:ENGINE Sockets:POLL,IPv6 Auth:LIBWRAP
Global options debug = 5 pid = /usr/local/var/run/stunnel/stunnel.pid RNDbytes = 64 RNDfile = /dev/urandom RNDoverwrite = yes
Service-level options cert = /usr/local/etc/stunnel/stunnel.pem ciphers = ALL:!DHE-RSA-AES256-SHA:!DHE-DSS-AES256-SHA:!AES256-SHA:!ADH:+RC4:@STRENGTH key = /usr/local/etc/stunnel/stunnel.pem session = 300 seconds stack = 65536 bytes sslVersion = SSLv3 for client, all for server TIMEOUTbusy = 300 seconds TIMEOUTclose = 60 seconds TIMEOUTconnect = 10 seconds TIMEOUTidle = 43200 seconds verify = none
-------------------------------------------------
#uname -a SunOS xxxxxxxxxx 5.10 Generic_144488-02 sun4u sparc SUNW,SPARC-Enterprise
--------------------------------------------------------------------- # gcc -v Reading specs from /usr/sfw/lib/gcc/sparc-sun-solaris2.10/3.4.3/specs Configured with: /sfw10/builds/build/sfw10-patch/usr/src/cmd/gcc/gcc-3.4.3/configure --prefix=/usr/sfw --with-as=/usr/ccs/bin/as --without-gnu-as --with-ld=/usr/ccs/bin/ld --without-gnu-ld --enable-languages=c,c++ --enable-shared Thread model: posix gcc version 3.4.3 (csl-sol210-3_4-branch+sol_rpath)
------------------------------------------------------------------------ # openssl version OpenSSL 0.9.7d 17 Mar 2004 (+ security fixes for: CVE-2005-2969 CVE-2006-2937 CVE-2006-2940 CVE-2006-3738 CVE-2006-4339 CVE-2006-4343 CVE-2007-5135 CVE-2007-3108 CVE-2008-5077 CVE-2009-0590 CVE-2009-3555) ----------------------------------------------------------------------------- # more avaloq.conf ( Ssl config file we use /usr/local/bin/stunnel /opt/app/ssl/stunnel.conf ; Sample stunnel configuration file by Michal Trojnara 2002-2006 ; Some options used here may not be adequate for your particular configuration ; Please make sure you understand them (especially the effect of chroot jail)
; Certificate/key is needed in server mode and optional in client mode cert = /opt/app/ssl/cert.pem key = /opt/app/ssl/cert.pem
; Protocol version (all, SSLv2, SSLv3, TLSv1) sslVersion = all
; Some security enhancements for UNIX systems - comment them out on Win32 ;chroot = /usr/local/var/lib/stunnel/ ;setuid = nobody ;setgid = nogroup ; PID is created inside chroot jail pid = /opt/app/ssl/avaloq_ssl/avaloq.pid
; Some performance tunings socket = l:TCP_NODELAY=1 socket = r:TCP_NODELAY=1 ;compression = rle
; Workaround for Eudora bug ;options = DONT_INSERT_EMPTY_FRAGMENTS
; Authentication stuff ;verify = 2 ; Don't forget to c_rehash CApath ; CApath is located inside chroot jail ;CApath = /certs ; It's often easier to use CAfile ;CAfile = /usr/local/etc/stunnel/certs.pem ; Don't forget to c_rehash CRLpath ; CRLpath is located inside chroot jail ;CRLpath = /crls ; Alternatively you can use CRLfile ;CRLfile = /usr/local/etc/stunnel/crls.pem
; Some debugging stuff useful for troubleshooting debug = 7 output = /db/avaloq/export/stunnel.log
; Use it for client mode ;client = yes
; Service-level configuration
[application] accept =7766 connect =localhost:7767 TIMEOUTconnect = 60
;[pop3s] ;accept = 995 ;connect = 110
;[imaps] ;accept = 993 ;connect = 143
;[ssmtp] ;accept = 465 ;connect = 25
;[https] ;accept = 443 ;connect = 80 ;TIMEOUTclose = 0
; vim:ft=dosini --------------------------------------------------------------------------------
#netstat -an |grep 7766 |wc -l 249
Regards, kumar
manoj kumar wrote:
stunnel: [ID 821868 daemon.info] LOG6[21204:5272]: s_poll_wait timeout: connection reset stunnel: [ID 821868 daemon.notice] LOG5[21204:4804]: Connecti on reset: 1867341 bytes sent to SSL, 55211 bytes sent to socket
What it means is that either your clients were idle for more than 12 hours, or they were disconnected without closing their connection (due to power outage, changed IP address, network availability issues, etc.).
If it's okay for your clients to keep connections idle for more than 12 hours, you can increase TIMEOUTidle. Just be aware that excessive number of unused connections could exhaust your operating system resources (file descriptors, memory).
Mike
Thanks for your reply.
No, We are getting the connection reset, even when clients are active and entering the orders. Moreover Problem some times reoccurs within 3-4 hours. It happens during business hours and also non business hours. Again no pattern to isolate the cause.
Since only stunnel logs any message during this problem, everybody is pointing to stunnel, which i disagree. Need your help to confirm below things.It will help to convince management the issue is not with stunnel.
1) what does s_poll_wait timeout mean.. 2) Is this points to connection between ( client & stunnel ) or ( stunnel & inetd ) 3) Was stunnel reseting connections b'se of timeout or mere reporting ( logging ) the lost connectivity. 4) Say if file descriptors limit is reached, in such case does stunnel deliberately terminate the existing sessions ? ( Note: Database & Stunnel runs with same user id, we don't have any issue with database ) 5)Is there any chance for session cache causing this issue ? 6)There is very less probability of 100's of clients timeout at same time (Hour,minute). Suggestions for any areas that we could look into.
Thanks for your help.
________________________________ From: Michal Trojnara Michal.Trojnara@mirt.net To: stunnel-users@mirt.net Sent: Tue, 18 January, 2011 4:48:58 PM Subject: Re: [stunnel-users] s_poll_wait timeout: connection reset
manoj kumar wrote:
stunnel: [ID 821868 daemon.info] LOG6[21204:5272]: s_poll_wait timeout: connection reset stunnel: [ID 821868 daemon.notice] LOG5[21204:4804]: Connecti on reset: 1867341 bytes sent to SSL, 55211 bytes sent to socket
What it means is that either your clients were idle for more than 12 hours, or they were disconnected without closing their connection (due to power outage, changed IP address, network availability issues, etc.).
If it's okay for your clients to keep connections idle for more than 12 hours, you can increase TIMEOUTidle. Just be aware that excessive number of unused connections could exhaust your operating system resources (file descriptors, memory).
Mike _______________________________________________ stunnel-users mailing list stunnel-users@mirt.net http://stunnel.mirt.net/mailman/listinfo/stunnel-users
manoj kumar wrote:
No, We are getting the connection reset, even when clients are active
and
entering the orders. Moreover Problem some times reoccurs within 3-4 hours.
You're right. That's not it.
I forgot to recommend an upgrade of your stunnel to the latest version (4.34). I think one of the bugs I fixed in this version may cause your problem.
Another check could be to try select() instead of poll(): 1. Edit src/Makefile and remove any "-DHAVE_POLL_H=1" or "-DHAVE_SYS_POLL_H=1" 2. Rebuild stunnel with "make clean && make && make install" 3. Check if "stunnel -version" reports "SELECT" instead of "POLL".
Mike
Thanks for your reply. I will put forth the upgrade suggestion.
Can you answer the below questions please.
1) what does s_poll_wait timeout mean..
2) Is this error points to connection between ( client & stunnel ) or ( stunnel & inetd )
3) Was stunnel reseting connections b'se of timeout or mere reporting ( logging ) the lost connectivity. Thanks in advance.
________________________________ From: Michal Trojnara Michal.Trojnara@mirt.net To: stunnel-users@mirt.net Sent: Tue, 18 January, 2011 7:39:02 PM Subject: Re: [stunnel-users] s_poll_wait timeout: connection reset
manoj kumar wrote:
No, We are getting the connection reset, even when clients are active
and
entering the orders. Moreover Problem some times reoccurs within 3-4 hours.
You're right. That's not it.
I forgot to recommend an upgrade of your stunnel to the latest version (4.34). I think one of the bugs I fixed in this version may cause your problem.
Another check could be to try select() instead of poll(): 1. Edit src/Makefile and remove any "-DHAVE_POLL_H=1" or "-DHAVE_SYS_POLL_H=1" 2. Rebuild stunnel with "make clean && make && make install" 3. Check if "stunnel -version" reports "SELECT" instead of "POLL".
Mike _______________________________________________ stunnel-users mailing list stunnel-users@mirt.net http://stunnel.mirt.net/mailman/listinfo/stunnel-users
manoj kumar wrote:
- what does s_poll_wait timeout mean..
It means that a time limit was exceeded while waiting for a set of events (e.g. new data available) on a set of sockets.
- Is this error points to connection between ( client & stunnel )
or ( stunnel & inetd )
Both, i.e. there was no data transfer on either socket.
BTW: I would use "exec" + "execargs" instead of "connect" and inetd.
- Was stunnel reseting connections b'se of timeout or mere
reporting ( logging ) the lost connectivity.
It's stunnel reporting timeout and resetting connections.
BTW: You are right. I will redesign timeout messages to be more meaningful.
Mike