Richard.Hall wrote:
First-time post (though long-time satisfied occasional user). Searched the archives but if there is anything relevant, I've obviously missed it.
I really cannot decide if this is an stunnel problem or an OpenSSL problem.
Client is remote and running stunnel 4.12 on i686-pc-linux-gnu UCONTEXT+POLL+IPv4 with OpenSSL 0.9.6i
Server is local, I've been through 4.12 + 0.9.6m 4.13 + 0.9.6m and now trying stunnel 4.13 on sparc-sun-solaris2.8 UCONTEXT+POLL+IPv4 with OpenSSL 0.9.7i (I've resisted the urge to try OpenSSL 0.9.8* as people seem to be experiencing 'issues')
Small amounts of data are fine. But for larger amounts the connection goes to sleep. The way I read it, the server end stops trying to write to the SSL side. The relevant parts of
# truss -w2 -v poll -dl /usr/local/stunnel-4.13/sbin/stunnel
look like this: (apologies for doing it like this, but it seemed the easiest way to combine the debug output and truss output)
(fd=7 is the SSL side, fd=8 the local side, TIMEOUTidle is set to 35 seconds for demo purposes)
/1: 15.5722 read(8, " A B\0\n03\007\t\t\r\n\t".., 8178) = 8178 /1: poll(0x00049DD8, 3, 35000) (sleeping...) /1: fd=3 ev=POLLIN rev=0 /1: fd=6 ev=POLLIN rev=0 /1: fd=7 ev=POLLIN|POLLOUT rev=POLLIN /1: 17.3166 poll(0x00049DD8, 3, 35000) = 1 /1: fd=3 ev=POLLIN rev=0 /1: fd=6 ev=POLLIN rev=0 /1: fd=7 ev=POLLIN|POLLOUT rev=POLLOUT /1: 17.3229 write(7, "1703\0\0188CBEC9 : O\t u".., 16442) = 16442 /1: 17.3234 poll(0x00049DD8, 4, 35000) = 1 /1: fd=3 ev=POLLIN rev=0 /1: fd=6 ev=POLLIN rev=0 /1: fd=8 ev=POLLIN rev=POLLIN /1: fd=7 ev=POLLIN rev=0
So far, so good.
/1: 17.3238 read(8, "\t\t\t\r\n\t\t\t\r\n\t\t".., 16384) = 16384 /1: 17.3240 poll(0x00049DD8, 3, 35000) = 1 /1: fd=3 ev=POLLIN rev=0 /1: fd=6 ev=POLLIN rev=0 /1: fd=7 ev=POLLIN|POLLOUT rev=POLLOUT /1: 17.3299 write(7, "1703\0\018ED 1AD91A888 i".., 16442) = 14480
For whatever reason, the write() was only partial. I'm guessing n/w issues, as the client is a loooong way away ;-)
/1: 17.3303 write(7, " fE3DDA8FE ! i8DD8\fEBEC".., 1962) Err#11 EAGAIN /1: 17.3308 write(2, 0x00071D60, 74) = 74 /1: 2 0 0 5 . 1 0 . 2 4 1 4 : 5 0 : 3 0 L O G 7 [ 2 0 6 4 7 : 2 /1: ] : S S L _ w r i t e r e t u r n e d W A N T _ W R I T E /1: : r e t r y i n g /1: 17.3312 write(2, "\n", 1) = 1 /1: 17.3315 read(7, 0x000778D8, 5) Err#11 EAGAIN /1: 17.3318 write(2, 0x00071D60, 72) = 72 /1: 2 0 0 5 . 1 0 . 2 4 1 4 : 5 0 : 3 0 L O G 7 [ 2 0 6 4 7 : 2 /1: ] : S S L _ r e a d r e t u r n e d W A N T _ R E A D : /1: r e t r y i n g /1: 17.3321 write(2, "\n", 1) = 1
So we've now done an SSL_write, which yielded SSL_ERROR_WANT_WRITE, _and_ an SSL_read, which yielded SSL_ERROR_WANT_READ, (and which I think removes the SSL_ERROR_WANT_WRITE condition, as we can only have one or the other.)
This means that, the next time round the loop (in client.c:transfer()), want_rd is TRUE, want_wr is FALSE, so we don't poll the SSL connection for output, only input:
/1: poll(0x00049DD8, 3, 35000) (sleeping...) /1: fd=3 ev=POLLIN rev=0 /1: fd=6 ev=POLLIN rev=0 /1: fd=7 ev=POLLIN rev=POLLOUT
But there won't be any input (trust me on that!), so we time out ...
/1: 52.3387 poll(0x00049DD8, 3, 35000) = 0 /1: fd=3 ev=POLLIN rev=0 /1: fd=6 ev=POLLIN rev=0 /1: fd=7 ev=POLLIN rev=0 /1: 52.3394 write(2, 0x00071D60, 72) = 72 /1: 2 0 0 5 . 1 0 . 2 4 1 4 : 5 1 : 0 5 L O G 6 [ 2 0 6 4 7 : 2 /1: ] : s _ p o l l _ w a i t t i m e o u t : c o n n e c t i /1: o n r e s e t /1: 52.3399 write(2, "\n", 1) = 1 /1: 52.3403 write(2, 0x00071DD0, 102) = 102 /1: 2 0 0 5 . 1 0 . 2 4 1 4 : 5 1 : 0 5 L O G 5 [ 2 0 6 4 7 : 2 /1: ] : C o n n e c t i o n r e s e t : 5 7 6 1 7 b y t e s /1: s e n t t o S S L , 1 3 1 b y t e s s e n t t o /1: s o c k e t
Now I have never before needed (nor wanted!) to delve into the inner workings of either stunnel or OpenSSL, so I may be in completely the wrong ballpark here. But I cannot understand the logic of the following bit of code in transfer() :-
if((c->sock_ptr && !want_rd) || /* socket input buffer not empty */ ssl_closing==CL_INIT /* need to send close_notify */ || ((c->ssl_ptr<BUFFSIZE || ssl_closing==CL_RETRY) && want_wr)) /* want to SSL_read or SSL_shutdown but write to the * underlying socket needed for the SSL protocol */ s_poll_add(&c->fds, c->ssl_wfd->fd, 0, 1);
In particular, if I change the first line to
if(c->sock_ptr || /* socket input buffer not empty */
then everything starts working (correctly as far as I can see). The OpenSSL man pages seem (to me) to say that it is perfectly permissible to call SSL_write when we have SSL_ERROR_WANT_READ.
Help?
I actually had started noticing the exact same behavior but hadn't had time to look into it. Mine seemed only to happen on images, leaving HTML pages alone but my HTML isn't all that big anyway comparatively.