We're facing an issue where the stunnel process running on our AWS Ec2 enters a zombie state. This results in the nfs server timeout (We're using EFS). We're running Amazon Linux 2 (Kernel version 4.14.318-241.531) with stunnel version 5.6.4. The configuration for stunnel is stated below:
; Sample stunnel configuration file for Win64 by Michal Trojnara 2002-2022 ; Some options used here may be inadequate for your particular configuration ; This sample file does *not* represent stunnel.conf defaults ; Please consult the manual for detailed description of available options
; ************************************************************************** ; * Global options * ; **************************************************************************
; Debugging stuff (may be useful for troubleshooting) ;debug = info ;output = stunnel.log
; Enable FIPS 140-2 mode if needed for compliance ;fips = yes
; Microsoft CryptoAPI engine allows for authentication with private keys ; stored in the Windows certificate store ; Each section using this feature also needs the "engineId = capi" option ;engine = capi ; You also need to disable TLS 1.2 or later, because the CryptoAPI engine ; currently does not support PSS ;sslVersionMax = TLSv1.1 ; TLSv1.1 requires security level 0 when compiled OpenSSL 3.0 and later ;securityLevel = 0
; The pkcs11 engine allows for authentication with cryptographic ; keys isolated in a hardware or software token ; MODULE_PATH specifies the path to the pkcs11 module shared library, ; such as softhsm2-x64.dll or opensc-pkcs11.dll ; IMPORTANT: A 64-bit stunnel requires 64-bit PKCS#11 modules ; Each section using this feature also needs the "engineId = pkcs11" option ;engine = pkcs11 ;engineCtrl = MODULE_PATH:softhsm2-x64.dll ;engineCtrl = PIN:1234
; ************************************************************************** ; * Service defaults may also be specified in individual service sections * ; **************************************************************************
; Enable support for the insecure SSLv3 protocol ;options = -NO_SSLv3
; These options provide additional security at some performance degradation ;options = SINGLE_ECDH_USE ;options = SINGLE_DH_USE
; ************************************************************************** ; * Include all configuration file fragments from the specified folder * ; **************************************************************************
;include = conf.d
; ************************************************************************** ; * Service definitions (at least one service has to be defined) * ; **************************************************************************
; ***************************************** Example TLS client mode services
[gmail-pop3] client = yes accept = 127.0.0.1:110 connect = pop.gmail.com:995 verifyChain = yes CAfile = ca-certs.pem checkHost = pop.gmail.com OCSPaia = yes
[gmail-imap] client = yes accept = 127.0.0.1:143 connect = imap.gmail.com:993 verifyChain = yes CAfile = ca-certs.pem checkHost = imap.gmail.com OCSPaia = yes
[gmail-smtp] client = yes accept = 127.0.0.1:25 connect = smtp.gmail.com:465 verifyChain = yes CAfile = ca-certs.pem checkHost = smtp.gmail.com OCSPaia = yes
; Encrypted HTTP proxy authenticated with a client certificate ; located in the Windows certificate store ;[example-proxy] ;client = yes ;accept = 127.0.0.1:8080 ;connect = example.com:8443 ;engineId = capi
; Encrypted HTTP proxy authenticated with a client certificate ; located in a cryptographic token ;[example-pkcs11] ;client = yes ;accept = 127.0.0.1:8080 ;connect = example.com:8443 ;engineId = pkcs11 ;cert = pkcs11:token=MyToken;object=MyCert ;key = pkcs11:token=MyToken;object=MyKey
; ***************************************** Example TLS server mode services
;[pop3s] ;accept = 995 ;connect = 110 ;cert = stunnel.pem
;[imaps] ;accept = 993 ;connect = 143 ;cert = stunnel.pem
; Either only expose this service to trusted networks, or require ; authentication when relaying emails originated from loopback. ; Otherwise the following configuration creates an open relay. ;[ssmtp] ;accept = 465 ;connect = 25 ;cert = stunnel.pem
; TLS front-end to a web server ;[https] ;accept = 443 ;connect = 80 ;cert = stunnel.pem ; "TIMEOUTclose = 0" is a workaround for a design flaw in Microsoft SChannel ; Microsoft implementations do not use TLS close-notify alert and thus they ; are vulnerable to truncation attacks ;TIMEOUTclose = 0
; Remote cmd.exe protected with PSK-authenticated TLS ; Create "secrets.txt" containing IDENTITY:KEY pairs ;[cmd] ;accept = 1337 ;exec = c:\windows\system32\cmd.exe ;execArgs = cmd.exe ;PSKsecrets = secrets.txt
; vim:ft=dosini
Hi,
On 11. Sep 2023, at 12:38, sharukh.khan+stunnel--- via stunnel-users stunnel-users@stunnel.org wrote:
We're facing an issue where the stunnel process running on our AWS Ec2 enters a zombie state. This results in the nfs server timeout (We're using EFS). We're running Amazon Linux 2 (Kernel version 4.14.318-241.531) with stunnel version 5.6.4.
I’m going to use my crystal ball here. It tells me you might be using https://github.com/aws/efs-utils to mount those EFS volumes. Additionally, it tells me that the version of efs-utils you are using is smaller than 1.33.3, and that you are missing https://github.com/aws/efs-utils/commit/865892d275298da4d735a60296952c7f3c41.... Prior to this commit the efs-utils watchdog would restart stunnel with stdout and stderr connected to a pipe that the watchdog process never read from, which eventually caused stunnel to attempt to write to the pipe, but the pipe’s 4k buffer was full, so the kernel blocked the process during the write. To confirm that, check whether the hanging stunnel process uses a pipe on file descriptors 0 and 1 in /proc/$pid/fd/.
Is my crystal ball correct?
If so, this isn’t an issue with stunnel, it’s an issue with efs-utils. Ask for an update from the vendor of your efs-utils package.
HTH, Clemens
Hi Clemens,
Thanks for the heads up. I checked the efs-utils version but to our dismay we're running version 1.33.3.1.
Regards, Sharukh
Hi Sharukh,
On 12. Sep 2023, at 07:04, sharukh.khan+stunnel--- via stunnel-users stunnel-users@stunnel.org wrote:
Thanks for the heads up. I checked the efs-utils version but to our dismay we're running version 1.33.3.1.
Bummer. The symptoms did fit so well.
I’d still recommend checking the file descriptors of one of those hanging stunnel processes just in case they have the same bug in a different location, but it might be an entirely different problem in your case.
I will try that but most of the affected instances have an extremely high iowait thereby rendering any command / login attempts useless. The last affected instance with the same issue couldn't display the output for top command for 10 mins.
Thanks for your help Clemens.
Regards, Sharukh Khan