Hi,
We've been running a rsync service over stunnel for some time now. After upgrading our environment, which involved moving from an old version of Linux to FreeBSD 9.3, and from stunnel-4.29 to stunnel-5.06, we've had a problem pop up.
Ever since switching over, we see that rsync daemon processes slowly build up over time on our server, each consuming 100% CPU. The vast majority of our clients are working just fine, and the problematic rsync processes just seem to have been dead/aborted connections from the clients (which are using wireless aircard, so connectivity can be shotty).
A quick 'truss' on the process shows they are doing nothing but sitting there and read()'ing in a loop indefinitely:
# truss -p 14689 | head
read(6,0x7fffffff8240,1) = 0 (0x0)
read(6,0x7fffffff8240,1) = 0 (0x0)
read(6,0x7fffffff8240,1) = 0 (0x0)
read(6,0x7fffffff8240,1) = 0 (0x0)
read(6,0x7fffffff8240,1) = 0 (0x0)
read(6,0x7fffffff8240,1) = 0 (0x0)
Furthermore, the connection appears to be closed:
# sockstat -4 | grep 14689
root rsync 14689 3 tcp4 127.0.0.1:873 127.0.0.1:27724
netstat -an | grep 27724
netstat: kvm not available: /dev/mem: No such file or directory
tcp4 0 0 127.0.0.1.873 127.0.0.1.27724 CLOSED
Looking through the stunnel logs, it looks like no data was ever really passed (perhaps the client gave up?):
2014.10.22 10:45:52 LOG7[34384941056]: Service [rsync] started
2014.10.22 10:45:52 LOG5[34384941056]: Service [rsync] accepted connection from 1.1.1.1:63095
2014.10.22 10:45:52 LOG7[34384941056]: SSL state (accept): before/accept initialization
2014.10.22 10:45:52 LOG7[34384941056]: SSL state (accept): SSLv3 read client hello A
2014.10.22 10:45:52 LOG7[34384941056]: SSL state (accept): SSLv3 write server hello A
2014.10.22 10:45:52 LOG7[34384941056]: SSL state (accept): SSLv3 write certificate A
2014.10.22 10:45:52 LOG7[34384941056]: SSL state (accept): SSLv3 write server done A
2014.10.22 10:45:52 LOG7[34384941056]: SSL state (accept): SSLv3 flush data
2014.10.22 10:45:52 LOG7[34384941056]: SSL state (accept): SSLv3 read client key exchange A
2014.10.22 10:45:52 LOG7[34384941056]: SSL state (accept): SSLv3 read finished A
2014.10.22 10:45:52 LOG7[34384941056]: SSL state (accept): SSLv3 write change cipher spec A
2014.10.22 10:45:52 LOG7[34384941056]: SSL state (accept): SSLv3 write finished A
2014.10.22 10:45:52 LOG7[34384941056]: SSL state (accept): SSLv3 flush data
2014.10.22 10:45:52 LOG7[34384941056]: 41 items in the session cache
2014.10.22 10:45:52 LOG7[34384941056]: 0 client connects (SSL_connect())
2014.10.22 10:45:52 LOG7[34384941056]: 0 client connects that finished
2014.10.22 10:45:52 LOG7[34384941056]: 0 client renegotiations requested
2014.10.22 10:45:52 LOG7[34384941056]: 110 server connects (SSL_accept())
2014.10.22 10:45:52 LOG7[34384941056]: 107 server connects that finished
2014.10.22 10:45:52 LOG7[34384941056]: 0 server renegotiations requested
2014.10.22 10:45:52 LOG7[34384941056]: 66 session cache hits
2014.10.22 10:45:52 LOG7[34384941056]: 0 external session cache hits
2014.10.22 10:45:52 LOG7[34384941056]: 32 session cache misses
2014.10.22 10:45:52 LOG7[34384941056]: 0 session cache timeouts
2014.10.22 10:45:52 LOG6[34384941056]: No peer certificate received
2014.10.22 10:45:52 LOG6[34384941056]: SSL accepted: new session negotiated
2014.10.22 10:45:52 LOG6[34384941056]: Negotiated SSLv3 ciphersuite AES256-SHA (256-bit encryption)
2014.10.22 10:45:52 LOG6[34384941056]: Compression: null, expansion: null
2014.10.22 10:45:52 LOG6[34384941056]: s_connect: connecting 127.0.0.1:873
2014.10.22 10:45:52 LOG5[34384941056]: s_connect: connected 127.0.0.1:873
2014.10.22 10:45:52 LOG5[34384941056]: Service [rsync] connected remote server from 127.0.0.1:27724
2014.10.22 10:45:52 LOG7[34384941056]: Remote socket (FD=11) initialized
2014.10.22 11:01:12 LOG3[34384941056]: SSL_read: Connection reset by peer (54)
2014.10.22 11:01:12 LOG5[34384941056]: Connection reset: 56 byte(s) sent to SSL, 56 byte(s) sent to socket
2014.10.22 11:01:12 LOG7[34384941056]: Remote socket (FD=11) closed
2014.10.22 11:01:12 LOG7[34384941056]: Local socket (FD=10) closed
2014.10.22 11:01:12 LOG7[34384941056]: Service [rsync] finished (4 left)
The clients are generally field units running stunnel versions in the 4.x range. Unfortunately, updating them is not an easy task and is not really under my control. Either way, this seems like a server side issue, and possibly an issue with rsync (we're running 3.1.0).
Has anyone run into anything like this before? Does anyone have any suggestions for additional debugging we can do to try to pinpoint the problem? We are going to try rolling back to stunnel-4.29 to see if the problem still occurs there on the new system.
Thanks,
Steve