Hi everyone,
I've recently upgraded from NixOS 24.05 to 24.11 which went from stunnel 5.72 to 5.73 and have since had stunnel crash on me multiple times on different servers (pretty much identical setup however) due to what looks like a memory allocation race? Since 5.73 moved OCSP fetches to a thread, and the problem hasn't ever happened with 5.72 it sounds like an issue specific to that. Someone familiar with the codebase probably has a better idea of what's going on. Given that my updates happened somewhen mid last month and there have been no certificate renewals since (except for one that wouldn't affect the systems that were affected, thanks so much to crt.sh and CT existing) the error message of "OCSP routines::status expired" also seems out of place at least for the certificate itself (the staple may be expired, but stunnel should probably get a new one in that case which may have to do with the error in question).
``` Dec 06 04:43:02 shell stunnel[1124111]: LOG5[per-minute]: OCSP: Certificate accepted Dec 06 04:43:02 shell stunnel[1124111]: 2024.12.06 04:43:02 LOG5[per-minute]: OCSP: Certificate accepted Dec 06 04:43:02 shell stunnel[1124111]: LOG5[per-minute]: OCSP: Certificate accepted Dec 06 04:43:02 shell stunnel[1124111]: 2024.12.06 04:43:02 LOG5[per-minute]: OCSP: Certificate accepted Dec 06 04:43:02 shell stunnel[1124111]: LOG5[per-minute]: OCSP: Certificate accepted Dec 06 04:43:02 shell stunnel[1124111]: 2024.12.06 04:43:02 LOG5[per-minute]: OCSP: Certificate accepted Dec 06 04:43:56 shell stunnel[1124111]: LOG5[ui]: Log file reopened Dec 06 04:43:56 shell stunnel[1124111]: 2024.12.06 04:43:56 LOG5[ui]: Log file reopened Dec 06 04:44:02 shell stunnel[1124111]: LOG3[per-minute]: OCSP: OCSP_check_validity: crypto/ocsp/ocsp_cl.c:351: error:1380007D:OCSP routines::status expired Dec 06 04:44:02 shell stunnel[1124111]: 2024.12.06 04:44:02 LOG3[per-minute]: OCSP: OCSP_check_validity: crypto/ocsp/ocsp_cl.c:351: error:1380007D:OCSP routines::status expired Dec 06 04:44:02 shell stunnel[1124111]: 2024.12.06 04:44:02 LOG4[per-minute]: OCSP: Unknown verification status Dec 06 04:44:02 shell stunnel[1124111]: INTERNAL ERROR: Memory allocated in a different thread at ocsp.c, line 412 Dec 06 04:44:02 shell stunnel[1124111]: LOG4[per-minute]: OCSP: Unknown verification status Dec 06 04:44:02 shell stunnel[1124111]: INTERNAL ERROR: Memory allocated in a different thread at ocsp.c, line 412 Dec 06 04:44:02 shell systemd-coredump[282485]: Process 1124111 (stunnel) of user 988 terminated abnormally with signal 6/ABRT, processing... Dec 06 04:44:02 shell systemd[1]: Started Process Core Dump (PID 282485/UID 0). Dec 06 04:44:02 shell systemd-coredump[282486]: [🡕] Process 1124111 (stunnel) of user 988 dumped core.
Module libgcc_s.so.1 without build-id. Module libcap.so.2 without build-id. Module stunnel without build-id. Stack trace of thread 1124120: #0 0x00007f9191c99a9c __pthread_kill_implementation (libc.so.6 + 0x92a9c) #1 0x00007f9191c47576 raise (libc.so.6 + 0x40576) #2 0x00007f9191c2f935 abort (libc.so.6 + 0x28935) #3 0x000056142e23e435 fatal_debug.cold (stunnel + 0xc435) #4 0x000056142e23eebc get_alloc_list_ptr (stunnel + 0xcebc) #5 0x000056142e23f0b9 str_detach_debug.part.0 (stunnel + 0xd0b9) #6 0x000056142e23f856 str_free_debug (stunnel + 0xd856) #7 0x000056142e259635 ocsp_stapling (stunnel + 0x27635) #8 0x000056142e25a18f per_minute_thread (stunnel + 0x2818f) #9 0x00007f9191c97d02 start_thread (libc.so.6 + 0x90d02) #10 0x00007f9191d173ac __clone3 (libc.so.6 + 0x1103ac)
Stack trace of thread 676245: #0 0x00007f9191d090af __poll (libc.so.6 + 0x1020af) #1 0x000056142e2509f1 s_poll_wait (stunnel + 0x1e9f1) #2 0x000056142e241f82 client_run (stunnel + 0xff82) #3 0x000056142e2438fc client_thread (stunnel + 0x118fc) #4 0x00007f9191c97d02 start_thread (libc.so.6 + 0x90d02) #5 0x00007f9191d173ac __clone3 (libc.so.6 + 0x1103ac)
Stack trace of thread 1124111: #0 0x00007f9191d090af __poll (libc.so.6 + 0x1020af) #1 0x000056142e2509f1 s_poll_wait (stunnel + 0x1e9f1) #2 0x000056142e25b5f6 daemon_loop (stunnel + 0x295f6) #3 0x000056142e23e680 main (stunnel + 0xc680) #4 0x00007f9191c3127e __libc_start_call_main (libc.so.6 + 0x2a27e) #5 0x00007f9191c31339 __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x2a339) #6 0x000056142e23e875 _start (stunnel + 0xc875)
Stack trace of thread 1124119: #0 0x00007f9191d090af __poll (libc.so.6 + 0x1020af) #1 0x000056142e2509f1 s_poll_wait (stunnel + 0x1e9f1) #2 0x000056142e250acd s_poll_sleep (stunnel + 0x1eacd) #3 0x000056142e25a21c per_second_thread (stunnel + 0x2821c) #4 0x00007f9191c97d02 start_thread (libc.so.6 + 0x90d02) #5 0x00007f9191d173ac __clone3 (libc.so.6 + 0x1103ac)
Stack trace of thread 676190: #0 0x00007f9191d090af __poll (libc.so.6 + 0x1020af) #1 0x000056142e2509f1 s_poll_wait (stunnel + 0x1e9f1) #2 0x000056142e241f82 client_run (stunnel + 0xff82) #3 0x000056142e2438fc client_thread (stunnel + 0x118fc) #4 0x00007f9191c97d02 start_thread (libc.so.6 + 0x90d02) #5 0x00007f9191d173ac __clone3 (libc.so.6 + 0x1103ac)
Stack trace of thread 1124121: #0 0x00007f9191d090af __poll (libc.so.6 + 0x1020af) #1 0x000056142e2509f1 s_poll_wait (stunnel + 0x1e9f1) #2 0x000056142e250acd s_poll_sleep (stunnel + 0x1eacd) #3 0x000056142e259fab per_day_thread (stunnel + 0x27fab) #4 0x00007f9191c97d02 start_thread (libc.so.6 + 0x90d02) #5 0x00007f9191d173ac __clone3 (libc.so.6 + 0x1103ac)
Stack trace of thread 676445: #0 0x00007f9191d090af __poll (libc.so.6 + 0x1020af) #1 0x000056142e2509f1 s_poll_wait (stunnel + 0x1e9f1) #2 0x000056142e241f82 client_run (stunnel + 0xff82) #3 0x000056142e2438fc client_thread (stunnel + 0x118fc) #4 0x00007f9191c97d02 start_thread (libc.so.6 + 0x90d02) #5 0x00007f9191d173ac __clone3 (libc.so.6 + 0x1103ac)
Stack trace of thread 676300: #0 0x00007f9191d090af __poll (libc.so.6 + 0x1020af) #1 0x000056142e2509f1 s_poll_wait (stunnel + 0x1e9f1) #2 0x000056142e241f82 client_run (stunnel + 0xff82) #3 0x000056142e2438fc client_thread (stunnel + 0x118fc) #4 0x00007f9191c97d02 start_thread (libc.so.6 + 0x90d02) #5 0x00007f9191d173ac __clone3 (libc.so.6 + 0x1103ac)
Stack trace of thread 676491: #0 0x00007f9191d090af __poll (libc.so.6 + 0x1020af) #1 0x000056142e2509f1 s_poll_wait (stunnel + 0x1e9f1) #2 0x000056142e241f82 client_run (stunnel + 0xff82) #3 0x000056142e2438fc client_thread (stunnel + 0x118fc) #4 0x00007f9191c97d02 start_thread (libc.so.6 + 0x90d02) #5 0x00007f9191d173ac __clone3 (libc.so.6 + 0x1103ac) ELF object binary architecture: AMD x86-64 Dec 06 04:44:02 shell systemd[1]: stunnel.service: Main process exited, code=dumped, status=6/ABRT Dec 06 04:44:02 shell systemd[1]: stunnel.service: Failed with result 'core-dump'. Dec 06 04:44:02 shell systemd[1]: stunnel.service: Consumed 2min 11.812s CPU time, 21.6M memory peak, 19.6M memory swap peak, 50.9M read from disk, 47.1M written to disk, 587.2M incoming IP traffic, 588.8M outgoing IP traffic. ```
Note that stunnel is handling mTLS for my prometheus, which means that it is indeed getting hammered by one scrape per exporter at a time, which in this case would make exactly five scrapes per minute but all of them happening at the same time, which, if this is a race, does make me a lot more likely to hit this issue.
Also a huge Thank You for building what has been an integral part of my infrastructure for almost a decade now.
Hi stunnel,
Thank you very much for the report. Please try https://www.stunnel.org/downloads/beta/stunnel-5.74b2.tar.gz and let us know if it fixed the problem.
Best regards, Mike
On 12/6/2024 2:33 PM, stunnel--- via stunnel-users wrote:
Hi everyone,
I've recently upgraded from NixOS 24.05 to 24.11 which went from stunnel 5.72 to 5.73 and have since had stunnel crash on me multiple times on different servers (pretty much identical setup however) due to what looks like a memory allocation race? Since 5.73 moved OCSP fetches to a thread, and the problem hasn't ever happened with 5.72 it sounds like an issue specific to that. Someone familiar with the codebase probably has a better idea of what's going on. Given that my updates happened somewhen mid last month and there have been no certificate renewals since (except for one that wouldn't affect the systems that were affected, thanks so much to crt.sh and CT existing) the error message of "OCSP routines::status expired" also seems out of place at least for the certificate itself (the staple may be expired, but stunnel should probably get a new one in that case which may have to do with the error in question).
Dec 06 04:43:02 shell stunnel[1124111]: LOG5[per-minute]: OCSP: Certificate accepted Dec 06 04:43:02 shell stunnel[1124111]: 2024.12.06 04:43:02 LOG5[per-minute]: OCSP: Certificate accepted Dec 06 04:43:02 shell stunnel[1124111]: LOG5[per-minute]: OCSP: Certificate accepted Dec 06 04:43:02 shell stunnel[1124111]: 2024.12.06 04:43:02 LOG5[per-minute]: OCSP: Certificate accepted Dec 06 04:43:02 shell stunnel[1124111]: LOG5[per-minute]: OCSP: Certificate accepted Dec 06 04:43:02 shell stunnel[1124111]: 2024.12.06 04:43:02 LOG5[per-minute]: OCSP: Certificate accepted Dec 06 04:43:56 shell stunnel[1124111]: LOG5[ui]: Log file reopened Dec 06 04:43:56 shell stunnel[1124111]: 2024.12.06 04:43:56 LOG5[ui]: Log file reopened Dec 06 04:44:02 shell stunnel[1124111]: LOG3[per-minute]: OCSP: OCSP_check_validity: crypto/ocsp/ocsp_cl.c:351: error:1380007D:OCSP routines::status expired Dec 06 04:44:02 shell stunnel[1124111]: 2024.12.06 04:44:02 LOG3[per-minute]: OCSP: OCSP_check_validity: crypto/ocsp/ocsp_cl.c:351: error:1380007D:OCSP routines::status expired Dec 06 04:44:02 shell stunnel[1124111]: 2024.12.06 04:44:02 LOG4[per-minute]: OCSP: Unknown verification status Dec 06 04:44:02 shell stunnel[1124111]: INTERNAL ERROR: Memory allocated in a different thread at ocsp.c, line 412 Dec 06 04:44:02 shell stunnel[1124111]: LOG4[per-minute]: OCSP: Unknown verification status Dec 06 04:44:02 shell stunnel[1124111]: INTERNAL ERROR: Memory allocated in a different thread at ocsp.c, line 412 Dec 06 04:44:02 shell systemd-coredump[282485]: Process 1124111 (stunnel) of user 988 terminated abnormally with signal 6/ABRT, processing... Dec 06 04:44:02 shell systemd[1]: Started Process Core Dump (PID 282485/UID 0). Dec 06 04:44:02 shell systemd-coredump[282486]: [🡕] Process 1124111 (stunnel) of user 988 dumped core. Module libgcc_s.so.1 without build-id. Module libcap.so.2 without build-id. Module stunnel without build-id. Stack trace of thread 1124120: #0 0x00007f9191c99a9c __pthread_kill_implementation (libc.so.6 + 0x92a9c) #1 0x00007f9191c47576 raise (libc.so.6 + 0x40576) #2 0x00007f9191c2f935 abort (libc.so.6 + 0x28935) #3 0x000056142e23e435 fatal_debug.cold (stunnel + 0xc435) #4 0x000056142e23eebc get_alloc_list_ptr (stunnel + 0xcebc) #5 0x000056142e23f0b9 str_detach_debug.part.0 (stunnel + 0xd0b9) #6 0x000056142e23f856 str_free_debug (stunnel + 0xd856) #7 0x000056142e259635 ocsp_stapling (stunnel + 0x27635) #8 0x000056142e25a18f per_minute_thread (stunnel + 0x2818f) #9 0x00007f9191c97d02 start_thread (libc.so.6 + 0x90d02) #10 0x00007f9191d173ac __clone3 (libc.so.6 + 0x1103ac) Stack trace of thread 676245: #0 0x00007f9191d090af __poll (libc.so.6 + 0x1020af) #1 0x000056142e2509f1 s_poll_wait (stunnel + 0x1e9f1) #2 0x000056142e241f82 client_run (stunnel + 0xff82) #3 0x000056142e2438fc client_thread (stunnel + 0x118fc) #4 0x00007f9191c97d02 start_thread (libc.so.6 + 0x90d02) #5 0x00007f9191d173ac __clone3 (libc.so.6 + 0x1103ac) Stack trace of thread 1124111: #0 0x00007f9191d090af __poll (libc.so.6 + 0x1020af) #1 0x000056142e2509f1 s_poll_wait (stunnel + 0x1e9f1) #2 0x000056142e25b5f6 daemon_loop (stunnel + 0x295f6) #3 0x000056142e23e680 main (stunnel + 0xc680) #4 0x00007f9191c3127e __libc_start_call_main (libc.so.6 + 0x2a27e) #5 0x00007f9191c31339 __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x2a339) #6 0x000056142e23e875 _start (stunnel + 0xc875) Stack trace of thread 1124119: #0 0x00007f9191d090af __poll (libc.so.6 + 0x1020af) #1 0x000056142e2509f1 s_poll_wait (stunnel + 0x1e9f1) #2 0x000056142e250acd s_poll_sleep (stunnel + 0x1eacd) #3 0x000056142e25a21c per_second_thread (stunnel + 0x2821c) #4 0x00007f9191c97d02 start_thread (libc.so.6 + 0x90d02) #5 0x00007f9191d173ac __clone3 (libc.so.6 + 0x1103ac) Stack trace of thread 676190: #0 0x00007f9191d090af __poll (libc.so.6 + 0x1020af) #1 0x000056142e2509f1 s_poll_wait (stunnel + 0x1e9f1) #2 0x000056142e241f82 client_run (stunnel + 0xff82) #3 0x000056142e2438fc client_thread (stunnel + 0x118fc) #4 0x00007f9191c97d02 start_thread (libc.so.6 + 0x90d02) #5 0x00007f9191d173ac __clone3 (libc.so.6 + 0x1103ac) Stack trace of thread 1124121: #0 0x00007f9191d090af __poll (libc.so.6 + 0x1020af) #1 0x000056142e2509f1 s_poll_wait (stunnel + 0x1e9f1) #2 0x000056142e250acd s_poll_sleep (stunnel + 0x1eacd) #3 0x000056142e259fab per_day_thread (stunnel + 0x27fab) #4 0x00007f9191c97d02 start_thread (libc.so.6 + 0x90d02) #5 0x00007f9191d173ac __clone3 (libc.so.6 + 0x1103ac) Stack trace of thread 676445: #0 0x00007f9191d090af __poll (libc.so.6 + 0x1020af) #1 0x000056142e2509f1 s_poll_wait (stunnel + 0x1e9f1) #2 0x000056142e241f82 client_run (stunnel + 0xff82) #3 0x000056142e2438fc client_thread (stunnel + 0x118fc) #4 0x00007f9191c97d02 start_thread (libc.so.6 + 0x90d02) #5 0x00007f9191d173ac __clone3 (libc.so.6 + 0x1103ac) Stack trace of thread 676300: #0 0x00007f9191d090af __poll (libc.so.6 + 0x1020af) #1 0x000056142e2509f1 s_poll_wait (stunnel + 0x1e9f1) #2 0x000056142e241f82 client_run (stunnel + 0xff82) #3 0x000056142e2438fc client_thread (stunnel + 0x118fc) #4 0x00007f9191c97d02 start_thread (libc.so.6 + 0x90d02) #5 0x00007f9191d173ac __clone3 (libc.so.6 + 0x1103ac) Stack trace of thread 676491: #0 0x00007f9191d090af __poll (libc.so.6 + 0x1020af) #1 0x000056142e2509f1 s_poll_wait (stunnel + 0x1e9f1) #2 0x000056142e241f82 client_run (stunnel + 0xff82) #3 0x000056142e2438fc client_thread (stunnel + 0x118fc) #4 0x00007f9191c97d02 start_thread (libc.so.6 + 0x90d02) #5 0x00007f9191d173ac __clone3 (libc.so.6 + 0x1103ac) ELF object binary architecture: AMD x86-64 Dec 06 04:44:02 shell systemd[1]: stunnel.service: Main process exited, code=dumped, status=6/ABRT Dec 06 04:44:02 shell systemd[1]: stunnel.service: Failed with result 'core-dump'. Dec 06 04:44:02 shell systemd[1]: stunnel.service: Consumed 2min 11.812s CPU time, 21.6M memory peak, 19.6M memory swap peak, 50.9M read from disk, 47.1M written to disk, 587.2M incoming IP traffic, 588.8M outgoing IP traffic.
Note that stunnel is handling mTLS for my prometheus, which means that it is indeed getting hammered by one scrape per exporter at a time, which in this case would make exactly five scrapes per minute but all of them happening at the same time, which, if this is a race, does make me a lot more likely to hit this issue.
Also a huge Thank You for building what has been an integral part of my infrastructure for almost a decade now. _______________________________________________ stunnel-users mailing list -- stunnel-users@stunnel.org To unsubscribe send an email to stunnel-users-leave@stunnel.org
Hi Mike,
currently rolling out 5.74b2 on my servers. Looking at the two processes that crashed around 1½ hours ago it may take a while to see any effect (and then it's more the lack of an effect anyway if the bug is fixed); those stunnels ran for 5d 17h and 6d 16h respectively.
As I've disabled any automatic restarting of stunnel for the time being so I should see outages in my monitoring. I'll keep you posted!
Regards, benaryorg
Hi Mike,
my longest running stunnel with 5.74b2 is now running a little over 8 days and the log line which was followed by the crash has occurred multiple times, without the crash however. I am reasonably confident that 5.74b2 did fix the bug. Thank you very much!
Guess now it's time to bug my distro to update to 5.74.
Regards, benaryorg
oh, something similar happened to me a couple weeks ago. I have 7 servers and on all servers, at the same time stunnel service became dead. I wasn't able to replicate this issue but thought it could be related to OCSP. I disabled OCSP in common.h and since then everything works fine. Just a workaround. Also added "Restart=always" to systemd service, just in case.