Thanks for the excellent response.  It does seem to be a nagle/buffering issue of some sort.  I set up stunnel as you suggested (so that all conns are localhost -> remotehost) and the problem disappeared.  However, after again explicitly turning TCP_NODELAY off completely in samba and stunnel, restarting, etc, the problem remains in the localhost -> localhost configuration. 
 
It'd be nice to fix it, but none of the socket options seem to change anything so will probably have to find a work around, such as keeping the number of files in the directory low or pushing harder to use cifs instead of stunnel.
 
Thanks again for your time.
 
Paul
 
On 4/3/08, Geoff Thorpe <geoff@geoffthorpe.net> wrote:
On Wed, 2008-04-02 at 14:12 +1100, Paul Kerin wrote:
> I've got stunnel debug set to 7 on client and server.  No errors and
> no logging at all except for the initial handshake when the mount is
> created.  Including the tcpdump would probably be excessive at this
> stage.  In summary, using stunnel the data gets transmitted in packets
> usually containing around 200 bytes, whereas without stunnel it's
> mostly 1408 byte packets.
>
> Any suggestions?

Given that you are also experiencing data-expansion overall (despite
getting smaller individual packets), I'm guessing that you are losing
coalescing as data enters the SSL tunnel. I assume the majority of data
is in the server-->client direction if you're doing an "ls", so what is
probably happening is this;

1. in the unencrypted case, small amounts of data are being emitted from
the samba server via socket writes, but it is coalescing within the TCP
stack due to Nagle (TCP_NODELAY).

2. in the encrypted case, samba server output is being emitted locally
into the SSL tunnel which produces delineated SSL records to be written
out on to the network. Even ignoring the expansion of SSL records
relative to small unencrypted inputs, there may also be sufficient
additional processing overhead in stunnel that Nagle doesn't get a
chance to coalesce these 200byte SSL records into larger TCP packets.

If my hunch is right, the individual writes from the samba server are
probably quite small (perhaps one per file in the directory listing
response) because even once they're encapsulated in SSL, the records are
still only 200 bytes. Depending on the SSL/TLS cipher-suite you're
using, that could be a significant expansion relative to the unencrypted
data. Also, in the unencrypted case (no stunnel), I imagine that if you
ran 10 copies of "openssl speed rsa1024" in a while(1) loop on the
server, you may find that you start getting smaller packets from the
server rather than the nice coalescing you're currently seeing. Ie. if
you slow samba down a little, the Nagle algorithm may start to fail you
in the same way you're already seeing with stunnel.

Also, certain combinations of SSL/TLS version and cipher-suite produce a
lot of extra "chatter" over the wire. Consider playing with the versions
and cipher-suites of your SSL tunnel to see if you can get a tunnel that
gets you a better data rate. Eg. the "default" cipher-suites for openssl
apps are often set to use ephemeral key-exchange cipher-suites because
they have a theoretical security advantage ("perfect forward secrecy"),
but they're also typically a lot bulkier (though this would be most
observable in the initial handshake rather than in the data-plane). Also
try SSLv3 rather than TLSv1, just for kicks...

What else? If the NAGLE issue is the key difference here, the localhost
connection between the stunnel server and the samba server is probably
too direct for samba's output to coalesce into larger packets the way it
does when it goes unencrypted onto the network. You could possibly
confirm this by moving the stunnel server to a different machine (even
back on the same machine as the stunnel client, just for curiosity's
sake) so that the samba output is still going unencrypted through a real
network link before it reaches the SSL tunnel. If this is the issue,
then you would observe that the SSL records within the encrypted tunnel
get larger, there should be fewer of them, and the expansion ratio of
data-size between unencrypted and encrypted network data should reduce
significantly (because you're possibly encrypting ~1Kb per SSL record,
rather than encrypting 10-50 bytes per SSL record, as may be the case
currently).

Those are all the thoughts I have for now - let me know what you find
and perhaps we can suggest something in consequence.

Cheers,
Geoff