After a fair bit of poking, prodding, and googling, I have yet to find the solution to my problem. So, here goes:
stunnel looks to be running slowly. Very slowly. We're starting to analyze it with gprof to see where it spends all of its time when we're trying to do content negotiation, and I've whipped up a quick-and-dirty Ruby script that grabs data from the server and then spits out the performace results; I'm sitting on my own T1, so our connectivity is pretty good.
The results with this script are similar with 'delay=yes', 'session=600', and 'compression=zlib'.
So, here's what happens when I nab data from our stunnel-ed server with said script (I can supply the script if anyone is interested):
#File Size time thrpht time thrprt multiple ---------------------------------------------------------------------- index.html 7k http 0.1548 ( 46k/s) https 0.7449 ( 9k/s) * 4.8131 test1.jpg 1k http 0.0928 ( 12k/s) https 0.6379 ( 1k/s) * 6.8726 test2.jpg 10k http 0.1812 ( 59k/s) https 0.6743 ( 16k/s) * 3.7208 test3.jpg 19k http 0.2236 ( 85k/s) https 0.7677 ( 24k/s) * 3.4338 test4.jpg 53k http 0.5668 ( 94k/s) https 1.1543 ( 46k/s) * 2.0365 test5.jpg 97k http 0.7861 (123k/s) https 2.1974 ( 44k/s) * 2.7954 test6.jpg 214k http 1.4061 (152k/s) https 2.3735 ( 90k/s) * 1.6880 test7.jpg 140k http 0.9434 (149k/s) https 1.9331 ( 72k/s) * 2.0491 test8.jpg 470k http 2.8590 (164k/s) https 3.7696 (124k/s) * 1.3185
('multiple' is the number of times 'slower' https is versus http).
And, here's my stunnel config:
***
cert = /etc/certs/combined.pem setuid = nobody setgid = nobody pid = /var/run/stunnel/stunnel.pid socket = l:TCP_NODELAY=1 socket = r:TCP_NODELAY=1 debug = 0 output = /var/log/stunnel
[https] local = 64.40.110.16 accept = 443 connect = 64.40.110.16:80 TIMEOUTclose = 0
***
So, any ideas why stunnel is working so slowly? I mean, I know there's the overhead of the SSL negotiation, and I've heard some things about a stunnel session cache, but nothing about enabling/using it (unless 'session = something bigger than zero' does so?).
Any suggestions to improve performance, especially for large batches of small transfers?
Thanks-in-advance!