Hi,
We've been running a rsync service over stunnel for some time now. After upgrading our environment, which involved moving from an old version of Linux to FreeBSD 9.3, and from stunnel-4.29 to stunnel-5.06, we've had a problem pop up.
Ever since switching over, we see that rsync daemon processes slowly build up over time on our server, each consuming 100% CPU. The vast majority of our clients are working just fine, and the problematic rsync processes just seem to have been dead/aborted connections from the clients (which are using wireless aircard, so connectivity can be shotty).
A quick 'truss' on the process shows they are doing nothing but sitting there and read()'ing in a loop indefinitely: # truss -p 14689 | head read(6,0x7fffffff8240,1) = 0 (0x0) read(6,0x7fffffff8240,1) = 0 (0x0) read(6,0x7fffffff8240,1) = 0 (0x0) read(6,0x7fffffff8240,1) = 0 (0x0) read(6,0x7fffffff8240,1) = 0 (0x0) read(6,0x7fffffff8240,1) = 0 (0x0)
Furthermore, the connection appears to be closed: # sockstat -4 | grep 14689 root rsync 14689 3 tcp4 127.0.0.1:873 127.0.0.1:27724 netstat -an | grep 27724 netstat: kvm not available: /dev/mem: No such file or directory tcp4 0 0 127.0.0.1.873 127.0.0.1.27724 CLOSED
Looking through the stunnel logs, it looks like no data was ever really passed (perhaps the client gave up?): 2014.10.22 10:45:52 LOG7[34384941056]: Service [rsync] started 2014.10.22 10:45:52 LOG5[34384941056]: Service [rsync] accepted connection from 1.1.1.1:63095 2014.10.22 10:45:52 LOG7[34384941056]: SSL state (accept): before/accept initialization 2014.10.22 10:45:52 LOG7[34384941056]: SSL state (accept): SSLv3 read client hello A 2014.10.22 10:45:52 LOG7[34384941056]: SSL state (accept): SSLv3 write server hello A 2014.10.22 10:45:52 LOG7[34384941056]: SSL state (accept): SSLv3 write certificate A 2014.10.22 10:45:52 LOG7[34384941056]: SSL state (accept): SSLv3 write server done A 2014.10.22 10:45:52 LOG7[34384941056]: SSL state (accept): SSLv3 flush data 2014.10.22 10:45:52 LOG7[34384941056]: SSL state (accept): SSLv3 read client key exchange A 2014.10.22 10:45:52 LOG7[34384941056]: SSL state (accept): SSLv3 read finished A 2014.10.22 10:45:52 LOG7[34384941056]: SSL state (accept): SSLv3 write change cipher spec A 2014.10.22 10:45:52 LOG7[34384941056]: SSL state (accept): SSLv3 write finished A 2014.10.22 10:45:52 LOG7[34384941056]: SSL state (accept): SSLv3 flush data 2014.10.22 10:45:52 LOG7[34384941056]: 41 items in the session cache 2014.10.22 10:45:52 LOG7[34384941056]: 0 client connects (SSL_connect()) 2014.10.22 10:45:52 LOG7[34384941056]: 0 client connects that finished 2014.10.22 10:45:52 LOG7[34384941056]: 0 client renegotiations requested 2014.10.22 10:45:52 LOG7[34384941056]: 110 server connects (SSL_accept()) 2014.10.22 10:45:52 LOG7[34384941056]: 107 server connects that finished 2014.10.22 10:45:52 LOG7[34384941056]: 0 server renegotiations requested 2014.10.22 10:45:52 LOG7[34384941056]: 66 session cache hits 2014.10.22 10:45:52 LOG7[34384941056]: 0 external session cache hits 2014.10.22 10:45:52 LOG7[34384941056]: 32 session cache misses 2014.10.22 10:45:52 LOG7[34384941056]: 0 session cache timeouts 2014.10.22 10:45:52 LOG6[34384941056]: No peer certificate received 2014.10.22 10:45:52 LOG6[34384941056]: SSL accepted: new session negotiated 2014.10.22 10:45:52 LOG6[34384941056]: Negotiated SSLv3 ciphersuite AES256-SHA (256-bit encryption) 2014.10.22 10:45:52 LOG6[34384941056]: Compression: null, expansion: null 2014.10.22 10:45:52 LOG6[34384941056]: s_connect: connecting 127.0.0.1:873 2014.10.22 10:45:52 LOG5[34384941056]: s_connect: connected 127.0.0.1:873 2014.10.22 10:45:52 LOG5[34384941056]: Service [rsync] connected remote server from 127.0.0.1:27724 2014.10.22 10:45:52 LOG7[34384941056]: Remote socket (FD=11) initialized 2014.10.22 11:01:12 LOG3[34384941056]: SSL_read: Connection reset by peer (54) 2014.10.22 11:01:12 LOG5[34384941056]: Connection reset: 56 byte(s) sent to SSL, 56 byte(s) sent to socket 2014.10.22 11:01:12 LOG7[34384941056]: Remote socket (FD=11) closed 2014.10.22 11:01:12 LOG7[34384941056]: Local socket (FD=10) closed 2014.10.22 11:01:12 LOG7[34384941056]: Service [rsync] finished (4 left)
The clients are generally field units running stunnel versions in the 4.x range. Unfortunately, updating them is not an easy task and is not really under my control. Either way, this seems like a server side issue, and possibly an issue with rsync (we're running 3.1.0).
Has anyone run into anything like this before? Does anyone have any suggestions for additional debugging we can do to try to pinpoint the problem? We are going to try rolling back to stunnel-4.29 to see if the problem still occurs there on the new system.
Thanks,
Steve
This turned out to be an issue with rsync-3.1.0, and fixed in rsync-3.1.1: - Fixed a bug in the early daemon protocol where a timeout failed to be honored (e.g. if the remote side fails to send us the initial protocol greeting).
Thanks,
Steve
From: stunnel-users [mailto:stunnel-users-bounces@stunnel.org] On Behalf Of Polyack, Steve Sent: Wednesday, October 22, 2014 12:36 PM To: stunnel-users@stunnel.org Cc: Saathoff, Graham Subject: [stunnel-users] stunnel service leaves stale connections to rsync on the server side
Hi,
We've been running a rsync service over stunnel for some time now. After upgrading our environment, which involved moving from an old version of Linux to FreeBSD 9.3, and from stunnel-4.29 to stunnel-5.06, we've had a problem pop up.
Ever since switching over, we see that rsync daemon processes slowly build up over time on our server, each consuming 100% CPU. The vast majority of our clients are working just fine, and the problematic rsync processes just seem to have been dead/aborted connections from the clients (which are using wireless aircard, so connectivity can be shotty).
A quick 'truss' on the process shows they are doing nothing but sitting there and read()'ing in a loop indefinitely: # truss -p 14689 | head read(6,0x7fffffff8240,1) = 0 (0x0) read(6,0x7fffffff8240,1) = 0 (0x0) read(6,0x7fffffff8240,1) = 0 (0x0) read(6,0x7fffffff8240,1) = 0 (0x0) read(6,0x7fffffff8240,1) = 0 (0x0) read(6,0x7fffffff8240,1) = 0 (0x0)
Furthermore, the connection appears to be closed: # sockstat -4 | grep 14689 root rsync 14689 3 tcp4 127.0.0.1:873 127.0.0.1:27724 netstat -an | grep 27724 netstat: kvm not available: /dev/mem: No such file or directory tcp4 0 0 127.0.0.1.873 127.0.0.1.27724 CLOSED
Looking through the stunnel logs, it looks like no data was ever really passed (perhaps the client gave up?): 2014.10.22 10:45:52 LOG7[34384941056]: Service [rsync] started 2014.10.22 10:45:52 LOG5[34384941056]: Service [rsync] accepted connection from 1.1.1.1:63095 2014.10.22 10:45:52 LOG7[34384941056]: SSL state (accept): before/accept initialization 2014.10.22 10:45:52 LOG7[34384941056]: SSL state (accept): SSLv3 read client hello A 2014.10.22 10:45:52 LOG7[34384941056]: SSL state (accept): SSLv3 write server hello A 2014.10.22 10:45:52 LOG7[34384941056]: SSL state (accept): SSLv3 write certificate A 2014.10.22 10:45:52 LOG7[34384941056]: SSL state (accept): SSLv3 write server done A 2014.10.22 10:45:52 LOG7[34384941056]: SSL state (accept): SSLv3 flush data 2014.10.22 10:45:52 LOG7[34384941056]: SSL state (accept): SSLv3 read client key exchange A 2014.10.22 10:45:52 LOG7[34384941056]: SSL state (accept): SSLv3 read finished A 2014.10.22 10:45:52 LOG7[34384941056]: SSL state (accept): SSLv3 write change cipher spec A 2014.10.22 10:45:52 LOG7[34384941056]: SSL state (accept): SSLv3 write finished A 2014.10.22 10:45:52 LOG7[34384941056]: SSL state (accept): SSLv3 flush data 2014.10.22 10:45:52 LOG7[34384941056]: 41 items in the session cache 2014.10.22 10:45:52 LOG7[34384941056]: 0 client connects (SSL_connect()) 2014.10.22 10:45:52 LOG7[34384941056]: 0 client connects that finished 2014.10.22 10:45:52 LOG7[34384941056]: 0 client renegotiations requested 2014.10.22 10:45:52 LOG7[34384941056]: 110 server connects (SSL_accept()) 2014.10.22 10:45:52 LOG7[34384941056]: 107 server connects that finished 2014.10.22 10:45:52 LOG7[34384941056]: 0 server renegotiations requested 2014.10.22 10:45:52 LOG7[34384941056]: 66 session cache hits 2014.10.22 10:45:52 LOG7[34384941056]: 0 external session cache hits 2014.10.22 10:45:52 LOG7[34384941056]: 32 session cache misses 2014.10.22 10:45:52 LOG7[34384941056]: 0 session cache timeouts 2014.10.22 10:45:52 LOG6[34384941056]: No peer certificate received 2014.10.22 10:45:52 LOG6[34384941056]: SSL accepted: new session negotiated 2014.10.22 10:45:52 LOG6[34384941056]: Negotiated SSLv3 ciphersuite AES256-SHA (256-bit encryption) 2014.10.22 10:45:52 LOG6[34384941056]: Compression: null, expansion: null 2014.10.22 10:45:52 LOG6[34384941056]: s_connect: connecting 127.0.0.1:873 2014.10.22 10:45:52 LOG5[34384941056]: s_connect: connected 127.0.0.1:873 2014.10.22 10:45:52 LOG5[34384941056]: Service [rsync] connected remote server from 127.0.0.1:27724 2014.10.22 10:45:52 LOG7[34384941056]: Remote socket (FD=11) initialized 2014.10.22 11:01:12 LOG3[34384941056]: SSL_read: Connection reset by peer (54) 2014.10.22 11:01:12 LOG5[34384941056]: Connection reset: 56 byte(s) sent to SSL, 56 byte(s) sent to socket 2014.10.22 11:01:12 LOG7[34384941056]: Remote socket (FD=11) closed 2014.10.22 11:01:12 LOG7[34384941056]: Local socket (FD=10) closed 2014.10.22 11:01:12 LOG7[34384941056]: Service [rsync] finished (4 left)
The clients are generally field units running stunnel versions in the 4.x range. Unfortunately, updating them is not an easy task and is not really under my control. Either way, this seems like a server side issue, and possibly an issue with rsync (we're running 3.1.0).
Has anyone run into anything like this before? Does anyone have any suggestions for additional debugging we can do to try to pinpoint the problem? We are going to try rolling back to stunnel-4.29 to see if the problem still occurs there on the new system.
Thanks,
Steve