Il giorno 27/ott/06, alle ore 09:31, Michal Trojnara ha scritto:
On Thursday 26 October 2006 17:54, Dario Mariani wrote:
I'm deploying stunnel on some servers. I did some tests, and i never had problems. For example, i tried 5k parallel connections, and i didn't have any problem.
Thank you for the information. What is your platform (hardware, operating system)?
Well, it was only a test that i did to help a user understand the concept of ulimit :) with the tunneling on (and an "ulimit -n 8192 in the stunnel.init script :) ) , we did 5k "telnet localhost 10001 &" or something similar, it wasn't a big stress test... the system worked like a charm, all connections went fine (until the oracle listener closed connection, of course), and without problems. I don't remember the system, but it was a solaris 9 on big iron (i think some sunfire 6800-15k-25k with 52 or 56 1.2-1.33g cpus and more than 200g of ram)
In some days (about the end of the _next_ week, sorry) i can give to you the results of some performance tests we did... but they are not very deep in details :)
2006.10.20 16:00:58 LOG7[20302:75]: SSL_read returned WANT_READ: retrying
[cut]
It did complete correctly within a pair of minutes on an ibook 64 1.33 1g ram, but with LOADS of want_read and want_write errors on both sides of stunnel.
They're not errors! They're debug (LOG7) messages. The message does not indicate anything wrong by itself.
Debugging should be only enabled when you're trying to diagnose a problem - not in a production system.
What is the problem (besides those debug messages)?
The problem is this: the system works well for about 45min, then gives these messages and hangs. Simple and useless :( The traffic "shape" is that of a datawarehouse, with a little number of connections (i think few 10s), that carries a good load of traffic from the db (stunnel server) to the appserver (stunnel client), with peaks every 15min. And i _think_ that sometimes there are big uploads (sql updates) from the client to the server. This is what i understood asking :)
Now, i'm a little confused... the server started giving these debug messages, and then HUNG HORRIBLY within minutes. :) With the tests that i made on my laptop, i had those debug messages, but it all worked well and in expected times (the path netcat 120m file -> stunnel client -> stunnel server -> openssl s_server >/dev/ null took 20 seconds!!! ) So, i think at this point the problem isn't the WANT_READ debug messaged, but something that can be (or not) related to this.
What i'm asking is: - what these messages _exactly_ means? reading some openssl related forums, i saw that this message is sent by the server when the read buffer is empty and the server is awaiting data. - do you have any idea on what topic i can direct my analysis?
How can I reproduce the hang mentioned int the subject?
Well, i have some problems with this point: i CANNOT put up stunnel on the system that had the problem, until i fix the problem :(
Excuse me for my lack of precision and details, but these are chaotic days here :)
Bye, dario.