-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
The problem has been reported in the stunnel-users mailing list:
http://stunnel.mirt.net/pipermail/stunnel-users/2005-May/000482.html http://stunnel.mirt.net/pipermail/stunnel-users/2005-May/000484.html http://stunnel.mirt.net/pipermail/stunnel-users/2005-May/000490.html
on freebsd-ports mailing list:
http://lists.FreeBSD.org/pipermail/freebsd-ports/2005-May/023344.html
and has several FreeBSD problem reports submitted:
http://www.FreeBSD.org/cgi/query-pr.cgi?pr=81001 http://www.FreeBSD.org/cgi/query-pr.cgi?pr=81162 http://www.FreeBSD.org/cgi/query-pr.cgi?pr=81289
Patch that fixes this problem can be found here:
http://www.FreeBSD.org/cgi/query-pr.cgi?pr=82202
It does not break anything if applied with Linux stunnel, so it can be considered "universal". It will be good if we see this (or similar) fix in the next version of stunnel.
The problem is that we cannot (at least on FreeBSD):
1. create context A (cleanup context) 2. create context B and set B's uc_link to point to A 3. swapcontext(3) to B What happens is that B's function returns, A's function passes control to the main thread of execution 4. create context C and set C's uc_link to point to A What happens is that A is exhausted and this will not work as expected 5. swapcontext(3) to C We expect A's function to be executed after C's function returns, but instead exit(3) is called when A's function returns and the whole program terminates normally after serving its second request.
On 2005-06-13, at 19:53, Vasil Dimov wrote:
Patch that fixes this problem can be found here: http://www.FreeBSD.org/cgi/query-pr.cgi?pr=82202
Here is my patch:
*** sthreads.c.old Mon Jun 13 21:34:53 2005 --- sthreads.c Mon Jun 13 22:15:01 2005 *************** *** 63,68 **** --- 63,69 ----
static void ctx_cleanup_func(void) { /* cleanup the active thread */ s_log(LOG_DEBUG, "Context %ld closed", ready_head->id); + makecontext(&ctx_cleanup, ctx_cleanup_func, 0); s_poll_wait(NULL, 0); /* wait on poll() */ }
It's quite simple and it works. 8-) It's on my FTP site, now.
Special thanks to Paul Dekkers for providing me with a testing FreeBSD 5.4 environment.
BTW: It still doesn't work on my UML (User-Mode Linux) host: 2005.06.13 22:43:23 LOG7[30415:0]: CONTEXT 1, FD=3, (IN)->() 2005.06.13 22:43:23 LOG7[30415:0]: CONTEXT 1, FD=5, (IN)->(IN) 2005.06.13 22:43:23 LOG7[30415:1]: Switching from context 1 to context 1 2005.06.13 22:43:23 LOG7[30415:1]: Context switched 2005.06.13 22:43:23 LOG7[30415:1]: TST_SVC accepted FD=6 from 127.0.0.1:3411 2005.06.13 22:43:23 LOG7[30415:1]: Creating a new context 2005.06.13 22:43:23 LOG7[30415:1]: Context 2 created 2005.06.13 22:43:23 LOG7[30415:2]: Switching from context 1 to context 2 Segmentation fault ./stunnel stunnel.conf
It fails on a first non-trivial context switch. Any idea?
Best regards, Mike
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Mon, Jun 13, 2005 at 10:53:00PM +0200, Michal Trojnara wrote:
*** sthreads.c.old Mon Jun 13 21:34:53 2005 --- sthreads.c Mon Jun 13 22:15:01 2005
*** 63,68 **** --- 63,69 ----
static void ctx_cleanup_func(void) { /* cleanup the active thread */ s_log(LOG_DEBUG, "Context %ld closed", ready_head->id);
}makecontext(&ctx_cleanup, ctx_cleanup_func, 0); s_poll_wait(NULL, 0); /* wait on poll() */
It's quite simple and it works. 8-) It's on my FTP site, now.
Changing the currently executing context (with makecontext) does not seem very robust to me, but, however, I am not an ucontext expert (in fact I have not heard about it before beginning to experience problems with stunnel-4.10).
BTW: It still doesn't work on my UML (User-Mode Linux) host: 2005.06.13 22:43:23 LOG7[30415:0]: CONTEXT 1, FD=3, (IN)->() 2005.06.13 22:43:23 LOG7[30415:0]: CONTEXT 1, FD=5, (IN)->(IN) 2005.06.13 22:43:23 LOG7[30415:1]: Switching from context 1 to context 1 2005.06.13 22:43:23 LOG7[30415:1]: Context switched 2005.06.13 22:43:23 LOG7[30415:1]: TST_SVC accepted FD=6 from 127.0.0.1:3411 2005.06.13 22:43:23 LOG7[30415:1]: Creating a new context 2005.06.13 22:43:23 LOG7[30415:1]: Context 2 created 2005.06.13 22:43:23 LOG7[30415:2]: Switching from context 1 to context 2 Segmentation fault ./stunnel stunnel.conf
It fails on a first non-trivial context switch. Any idea?
Hmm, what is this version of stunnel? 4.10 does not print "Switching from context M to context N", "Context switched", etc.
Is the number after the pid showing the thread id? How is it possible that context2 prints "Switching from context 1 to context 2", shouldnt this be printed by context 1?
I have tested 4.10 with yours bsd.patch and it works on Linux 2.4.18-14, Red Hat Linux release 8.0 (Psyche).
Do you confirm that stunnel begins to crash after applying bsd.patch on your User-Mode Linux?
Does the Context 2 function start execution at all, or the program crashes before that?
Some of the obvious reasons for the above crash can be:
1. invalid pointer given as a first argument to swapcontext(), e.g. nowhere to save the old context. 2. the new context (second argument) is somehow inconsistent. This includes the whole pointer, inaccessible function given to makecontext in initializaion, the argument not initialized with getcontext, uc_link is bogus, or something else.
I noticed that backtraces of ucontexting programs are not very useful, but can you retrieve a backtrace from the core file? What does it show? Running the program via gdb and stepping while the crash occurs may also be helpful.
Regards, Vasil
On 2005-06-14, at 14:08, Vasil Dimov wrote:
makecontext(&ctx_cleanup, ctx_cleanup_func, 0);
Changing the currently executing context (with makecontext) does not seem very robust to me, but, however, I am not an ucontext expert (in fact I have not heard about it before beginning to experience problems with stunnel-4.10).
makecontext(3) only prepares the structure. setcontext(2) modifies the context.
Hmm, what is this version of stunnel?
4.10 + some diagnostic code.
4.10 does not print "Switching from context M to context N", "Context switched", etc.
Is the number after the pid showing the thread id? How is it possible that context2 prints "Switching from context 1 to context 2", shouldnt this be printed by context 1?
Context 2 is already in the ready queue. The code is: /* switch the context */ if(fds) { /* swap the context */ s_log(LOG_DEBUG, "Switching from context %ld to context %ld", ctx->id, ready_head->id); swapcontext(&ctx->ctx, &ready_head->ctx); s_log(LOG_DEBUG, "Context switched"); return ready_head->ready; } else { /* drop the context */ setcontext(&ready_head->ctx); ioerror("setcontext"); /* should not ever happen */ return 0; }
Do you confirm that stunnel begins to crash after applying bsd.patch on your User-Mode Linux?
No. It has nothing to do with bsd.patch
Does the Context 2 function start execution at all, or the program crashes before that?
The crash is on swapcontext().
I noticed that backtraces of ucontexting programs are not very useful, but can you retrieve a backtrace from the core file? What does it show?
It seems smashed.
Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 16384 (LWP 704)] 0x40020bae in _pthread_cleanup_push_defer () from /lib/libpthread.so.0 (gdb) bt #0 0x40020bae in _pthread_cleanup_push_defer () from /lib/libpthread.so.0 #1 0x401f702c in vfprintf () from /lib/libc.so.6 #2 0x401f2ec0 in vfprintf () from /lib/libc.so.6 #3 0x401fb94f in fprintf () from /lib/libc.so.6 #4 0x0804df97 in s_log (level=134571040, format=0x402d5440 "\207(??\207T-@\207T-@\207T-@\207T-@\207T-@\207T-@\207T-@\210T-@") at log.c:152 #5 0x0804b514 in client (arg=0x807d960) at client.c:106 #6 0x401f0674 in makecontext () from /lib/libc.so.6 #7 0x0807d960 in ?? () #8 0x0805eca0 in next_id () #9 0x00000002 in ?? () #10 0x00000000 in ?? () #11 0x0805eca0 in next_id () #12 0x08085ed0 in ?? () #13 0x00000000 in ?? () #14 0x00010000 in ?? () #15 0x00000000 in ?? () #16 0x00000000 in ?? () #17 0x00000000 in ?? () #18 0x00000000 in ?? () #19 0xbffff99c in ?? () #20 0x00000009 in ?? ()
Best regards, Mike
Here is the whole backtrace:
Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 16384 (LWP 1662)] 0x40020bae in _pthread_cleanup_push_defer () from /lib/libpthread.so.0 (gdb) bt #0 0x40020bae in _pthread_cleanup_push_defer () from /lib/libpthread.so.0 #1 0x401f702c in vfprintf () from /lib/libc.so.6 #2 0x401f2ec0 in vfprintf () from /lib/libc.so.6 #3 0x401fb94f in fprintf () from /lib/libc.so.6 #4 0x0804df97 in s_log (level=134571040, format=0x402d5440 "\207(??\207T-@\207T-@\207T-@\207T-@\207T-@\207T-@\207T-@\210T-@") at log.c:152 #5 0x0804b514 in client (arg=0x807d960) at client.c:106 #6 0x401f0674 in makecontext () from /lib/libc.so.6 #7 0x0807d960 in ?? () #8 0x0805eca0 in next_id () #9 0x00000002 in ?? () #10 0x00000000 in ?? () #11 0x0805eca0 in next_id () #12 0x08085ed0 in ?? () #13 0x00000000 in ?? () #14 0x00010000 in ?? () #15 0x00000000 in ?? () #16 0x00000000 in ?? () #17 0x00000000 in ?? () #18 0x00000000 in ?? () #19 0xbffff99c in ?? () #20 0x00000009 in ?? () #21 0xbffff948 in ?? () #22 0x08095ec4 in ?? () #23 0x00000001 in ?? () #24 0x00000000 in ?? () #25 0x00000000 in ?? () #26 0x00000000 in ?? () #27 0x00000000 in ?? () #28 0x00000000 in ?? () #29 0x0804b4e0 in alloc_client_session () at client.c:92 #30 0x08054412 in create_client (ls=613270282, s=59548, arg=0x402d5440, cli=0x402d5440 <_IO_2_1_stderr_>) at sthreads.c:133 Previous frame inner to this frame (corrupt stack?)
Interesting, isn't it?
Best regards, Mike
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Wed, Jun 15, 2005 at 04:31:16PM +0200, Michal Trojnara wrote:
Here is the whole backtrace:
...
#29 0x0804b4e0 in alloc_client_session () at client.c:92 #30 0x08054412 in create_client (ls=613270282, s=59548, arg=0x402d5440, cli=0x402d5440 <_IO_2_1_stderr_>) at sthreads.c:133 Previous frame inner to this frame (corrupt stack?)
Interesting, isn't it?
Yes, it seems that the stack is totally corrupted with all those 0x00000000s on it.
So the problem may not be in swapcontext() or ucontext at all. Maybe the stack gets corrupted just before the crash (swapcontext call) or 50 lines before that?
You should try to run the prog via gdb and set some breakpoint nearby (but before) swapcontext call and then step carefully and inspect what happens after each move, especially the contents of ctx->ctx and ready_head->ctx and all their members and sub-members.
....
OK I see in the mailing list that the bug is hunted already. It would be really hard to debug that problem. Thanks to https://listman.redhat.com/archives/phil-list/2003-January/msg00097.html
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Wed, Jun 15, 2005 at 03:59:55PM +0200, Michal Trojnara wrote:
On 2005-06-14, at 14:08, Vasil Dimov wrote:
makecontext(&ctx_cleanup, ctx_cleanup_func, 0);
Changing the currently executing context (with makecontext) does not seem very robust to me, but, however, I am not an ucontext expert (in fact I have not heard about it before beginning to experience problems with stunnel-4.10).
makecontext(3) only prepares the structure. setcontext(2) modifies the context.
I meant "Changing" in the means of alter/modify/edit/mutate, not switch/swap/replace. But however, the patch seems to work.
So we have "4.10 + some diagnostic code" that crashes in swapcontext() under Linux. Knowing that 4.10 does not crash under Linux we are forced to deduce that the problem is in the diagnostic code :)
OR
I misunderstood something (again :)) and what we have is that 4.10 with or without the diagnostic code crashes only on your "User Mode Linux"? If this is the case what kind of Linux is that? Broken libc/ucontext implementation?
Regards, Vasil
On 2005-06-15, at 17:32, Vasil Dimov wrote:
makecontext(&ctx_cleanup, ctx_cleanup_func, 0);
Changing the currently executing context (with makecontext) does not seem very robust to me, but, however, I am not an ucontext expert (in fact I have not heard about it before beginning to experience problems with stunnel-4.10).
makecontext(3) only prepares the structure. setcontext(2) modifies the context.
I meant "Changing" in the means of alter/modify/edit/mutate, not switch/swap/replace. But however, the patch seems to work.
By "context" I mean the actual CPU state, not the structure used as a parameter to set it. 8-)
So we have "4.10 + some diagnostic code" that crashes in swapcontext() under Linux. Knowing that 4.10 does not crash under Linux we are forced to deduce that the problem is in the diagnostic code :)
The problem is solved! It was linuxthreads/ucontext conflict in glibc.
Best regards, Mike
Quoting Michal Trojnara Michal.Trojnara@mirt.net:
On 2005-06-15, at 17:32, Vasil Dimov wrote:
makecontext(&ctx_cleanup, ctx_cleanup_func, 0);
Changing the currently executing context (with makecontext) does not seem very robust to me, but, however, I am not an ucontext expert (in fact I have not heard about it before beginning to experience problems with stunnel-4.10).
makecontext(3) only prepares the structure. setcontext(2) modifies the context.
I meant "Changing" in the means of alter/modify/edit/mutate, not switch/swap/replace. But however, the patch seems to work.
By "context" I mean the actual CPU state, not the structure used as a parameter to set it. 8-)
So we have "4.10 + some diagnostic code" that crashes in swapcontext() under Linux. Knowing that 4.10 does not crash under Linux we are forced to deduce that the problem is in the diagnostic code :)
The problem is solved! It was linuxthreads/ucontext conflict in glibc.
I've apparently found something else as I did a bit of testing and it was fine but within 1 minute of putting it on a production machine it sigsegv'd again.
I unfortunately can't leave it running to debug more.
brian