My stunnel implementation works fine when the network is plugged in but it does not start at all, which stops the whole boot process, when there is no network connected on the machine (as a client).
As extra information:
- I'm using "delay=yes" - I'm using an fqdn (e.g: stunnel.mydomain.com) for the connections - Using stunnel 4.53 - Using ubuntu 12.04 but this also happened with centos5 previously however after some time it did boot on centos which doesn't happen in ubuntu
How can this be avoided or a timeout specified?
Doing an strace as suggested by symcbean shows the following (including the last part where it hangs):
[...] --- SIGCHLD (Child exited) @ 0 (0) --- rt_sigreturn(0x11) = 0 close(3) = 0 wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 6039 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7ff9ce0c79d0) = 6046 wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 6046 --- SIGCHLD (Child exited) @ 0 (0) --- rt_sigreturn(0x11) = 6046 write(1, "[Started: /etc/stunnel/stunnel.c"..., 37) = 37 write(1, "stunnel.\n", 9) = 9 exit_group(0) = ? [...]
stunnel hangs in this line: wait4(-1,
and when i plug in the network cable it continues to show [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 6046
Any help would be appreciated as this is critically blocking our deployment
http://serverfault.com/questions/429072/stunnel-delaying-boot
Muschi Stevenson wrote:
Doing an strace as suggested by symcbean shows the following
The suggestion was good. Unfortunately the sample you collected only contains execution flow of your shell while processing /etc/init.d/stunnel4 (or equivalent) startup script. To also collect the trace of stunnel execution use "-f" option of strace.
Mike
Hi Mike an others,
I tried to isolate the part that keeps looping indefinitely with some information obfuscated for privacy, feel free to request any other part of the strace process that could be of help.
From what i could see stunnel just keeps looping trying to resolve the
domain ("c\6\1\0\0\1\0\0\0\0\0\0\3stunnel\vmydomain\3com" => stunnel.mydomain.com) and never advances from there stalling the whole boot process.
I left the strace running for roughly 2 hours and the machine still didn't boot.
[...] uname({sys="Linux", node="nodename", ...}) = 0 13486 socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 7 13486 connect(7, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.10.20.1")}, 16) = 0 13486 poll([{fd=7, events=POLLOUT}], 1, 0) = 1 ([{fd=7, revents=POLLOUT}]) 13486 sendto(7, "c\6\1\0\0\1\0\0\0\0\0\0\3stunnel\vmydomain\3com"..., 37, MSG_NOSIGNAL, NULL, 0) = 37 13486 poll([{fd=7, events=POLLIN}], 1, 5000) = 0 (Timeout) 13486 socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 8 13486 connect(8, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("8.8.8.8")}, 16) = 0 13486 poll([{fd=8, events=POLLOUT}], 1, 0) = 1 ([{fd=8, revents=POLLOUT}]) 13486 sendto(8, "c\6\1\0\0\1\0\0\0\0\0\0\3stunnel\vmydomain\3com"..., 37, MSG_NOSIGNAL, NULL, 0) = 37 13486 poll([{fd=8, events=POLLIN}], 1, 5000) = 0 (Timeout) 13486 poll([{fd=7, events=POLLOUT}], 1, 0) = 1 ([{fd=7, revents=POLLOUT}]) 13486 sendto(7, "c\6\1\0\0\1\0\0\0\0\0\0\3stunnel\vmydomain\3com"..., 37, MSG_NOSIGNAL, NULL, 0) = 37 13486 poll([{fd=7, events=POLLIN}], 1, 5000) = 0 (Timeout) 13486 poll([{fd=8, events=POLLOUT}], 1, 0) = 1 ([{fd=8, revents=POLLOUT}]) 13486 sendto(8, "c\6\1\0\0\1\0\0\0\0\0\0\3stunnel\vmydomain\3com"..., 37, MSG_NOSIGNAL, NULL, 0) = 37 13486 poll([{fd=8, events=POLLIN}], 1, 5000) = 0 (Timeout) 13486 close(7) = 0 13486 close(8) = 0 13486 stat("/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=192, ...}) = 0 13486 socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 7 13486 connect(7, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.10.20.1")}, 16) = 0 13486 poll([{fd=7, events=POLLOUT}], 1, 0) = 1 ([{fd=7, revents=POLLOUT}]) 13486 sendto(7, "^S\1\0\0\1\0\0\0\0\0\0\3stunnel\vmydomain\3com"..., 37, MSG_NOSIGNAL, NULL, 0) = 37 13486 poll([{fd=7, events=POLLIN}], 1, 5000) = 0 (Timeout) 13486 socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 8 13486 connect(8, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("8.8.8.8")}, 16) = 0 13486 poll([{fd=8, events=POLLOUT}], 1, 0) = 1 ([{fd=8, revents=POLLOUT}]) 13486 sendto(8, "^S\1\0\0\1\0\0\0\0\0\0\3stunnel\vmydomain\3com"..., 37, MSG_NOSIGNAL, NULL, 0) = 37 13486 poll([{fd=8, events=POLLIN}], 1, 5000) = 0 (Timeout) 13486 poll([{fd=7, events=POLLOUT}], 1, 0) = 1 ([{fd=7, revents=POLLOUT}]) 13486 sendto(7, "^S\1\0\0\1\0\0\0\0\0\0\3stunnel\vmydomain\3com"..., 37, MSG_NOSIGNAL, NULL, 0) = 37 13486 poll([{fd=7, events=POLLIN}], 1, 5000) = 0 (Timeout) 13486 poll([{fd=8, events=POLLOUT}], 1, 0) = 1 ([{fd=8, revents=POLLOUT}]) 13486 sendto(8, "^S\1\0\0\1\0\0\0\0\0\0\3stunnel\vmydomain\3com"..., 37, MSG_NOSIGNAL, NULL, 0) = 37 13486 poll([{fd=8, events=POLLIN}], 1, 5000) = 0 (Timeout) 13486 close(7) = 0 13486 close(8) = 0 13486 socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 7 13486 connect(7, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.10.20.1")}, 16) = 0 13486 poll([{fd=7, events=POLLOUT}], 1, 0) = 1 ([{fd=7, revents=POLLOUT}]) 13486 sendto(7, "&\315\1\0\0\1\0\0\0\0\0\0\3stunnel\vmydomain\3com"..., 37, MSG_NOSIGNAL, NULL, 0) = 37 13486 poll([{fd=7, events=POLLIN}], 1, 5000) = 0 (Timeout) 13486 socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 8 13486 connect(8, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("8.8.8.8")}, 16) = 0 13486 poll([{fd=8, events=POLLOUT}], 1, 0) = 1 ([{fd=8, revents=POLLOUT}]) 13486 sendto(8, "&\315\1\0\0\1\0\0\0\0\0\0\3stunnel\vmydomain\3com"..., 37, MSG_NOSIGNAL, NULL, 0) = 37 13486 poll([{fd=8, events=POLLIN}], 1, 5000) = 0 (Timeout) 13486 poll([{fd=7, events=POLLOUT}], 1, 0) = 1 ([{fd=7, revents=POLLOUT}]) 13486 sendto(7, "&\315\1\0\0\1\0\0\0\0\0\0\3stunnel\vmydomain\3com"..., 37, MSG_NOSIGNAL, NULL, 0) = 37 13486 poll([{fd=7, events=POLLIN}], 1, 5000) = 0 (Timeout) 13486 poll([{fd=8, events=POLLOUT}], 1, 0) = 1 ([{fd=8, revents=POLLOUT}]) 13486 sendto(8, "&\315\1\0\0\1\0\0\0\0\0\0\3stunnel\vmydomain\3com"..., 37, MSG_NOSIGNAL, NULL, 0) = 37 13486 poll([{fd=8, events=POLLIN}], 1, 5000) = 0 (Timeout) 13486 close(7) = 0 13486 close(8) = 0 13486 stat("/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=192, ...}) = 0 13486 socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 7 13486 connect(7, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.10.20.1")}, 16) = 0 13486 poll([{fd=7, events=POLLOUT}], 1, 0) = 1 ([{fd=7, revents=POLLOUT}]) 13486 sendto(7, "?k\1\0\0\1\0\0\0\0\0\0\3stunnel\vmydomain\3com"..., 37, MSG_NOSIGNAL, NULL, 0) = 37 13486 poll([{fd=7, events=POLLIN}], 1, 5000) = 0 (Timeout) 13486 socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 8 13486 connect(8, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("8.8.8.8")}, 16) = 0 13486 poll([{fd=8, events=POLLOUT}], 1, 0) = 1 ([{fd=8, revents=POLLOUT}]) 13486 sendto(8, "?k\1\0\0\1\0\0\0\0\0\0\3stunnel\vmydomain\3com"..., 37, MSG_NOSIGNAL, NULL, 0) = 37 13486 poll([{fd=8, events=POLLIN}], 1, 5000) = 0 (Timeout) 13486 poll([{fd=7, events=POLLOUT}], 1, 0) = 1 ([{fd=7, revents=POLLOUT}]) 13486 sendto(7, "?k\1\0\0\1\0\0\0\0\0\0\3stunnel\vmydomain\3com"..., 37, MSG_NOSIGNAL, NULL, 0) = 37 13486 poll([{fd=7, events=POLLIN}], 1, 5000) = 0 (Timeout) 13486 poll([{fd=8, events=POLLOUT}], 1, 0) = 1 ([{fd=8, revents=POLLOUT}]) 13486 sendto(8, "?k\1\0\0\1\0\0\0\0\0\0\3stunnel\vmydomain\3com"..., 37, MSG_NOSIGNAL, NULL, 0) = 37 13486 poll([{fd=8, events=POLLIN}], 1, 5000) = 0 (Timeout) 13486 close(7) = 0 13486 close(8) = 0 13486 socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 7 13486 connect(7, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.10.20.1")}, 16) = 0 13486 poll([{fd=7, events=POLLOUT}], 1, 0) = 1 ([{fd=7, revents=POLLOUT}]) 13486 sendto(7, "(\343\1\0\0\1\0\0\0\0\0\0\3stunnel\vmydomain\3com"..., 37, MSG_NOSIGNAL, NULL, 0) = 37 13486 poll([{fd=7, events=POLLIN}], 1, 5000) = 0 (Timeout) 13486 socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 8 13486 connect(8, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("8.8.8.8")}, 16) = 0 13486 poll([{fd=8, events=POLLOUT}], 1, 0) = 1 ([{fd=8, revents=POLLOUT}]) 13486 sendto(8, "(\343\1\0\0\1\0\0\0\0\0\0\3stunnel\vmydomain\3com"..., 37, MSG_NOSIGNAL, NULL, 0) = 37 13486 poll([{fd=8, events=POLLIN}], 1, 5000) = 0 (Timeout) 13486 poll([{fd=7, events=POLLOUT}], 1, 0) = 1 ([{fd=7, revents=POLLOUT}]) 13486 sendto(7, "(\343\1\0\0\1\0\0\0\0\0\0\3stunnel\vmydomain\3com"..., 37, MSG_NOSIGNAL, NULL, 0) = 37 13486 poll([{fd=7, events=POLLIN}], 1, 5000) = 0 (Timeout) 13486 poll([{fd=8, events=POLLOUT}], 1, 0) = 1 ([{fd=8, revents=POLLOUT}]) 13486 sendto(8, "(\343\1\0\0\1\0\0\0\0\0\0\3stunnel\vmydomain\3com"..., 37, MSG_NOSIGNAL, NULL, 0) = 37 13486 poll([{fd=8, events=POLLIN}], 1, 5000) = 0 (Timeout) 13486 close(7) = 0 13486 close(8) = 0 13486 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 13486 rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0 13486 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 13486 nanosleep({1, 0}, 0x7fff3a3c44f0) = 0 13486 stat("/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=192, ...}) = 0 13486 open("/etc/hosts", O_RDONLY|O_CLOEXEC) = 7 13486 fstat(7, {st_mode=S_IFREG|0644, st_size=261, ...}) = 0 13486 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa6812b9000 13486 read(7, "127.0.0.1\tlocalhost\n127.0.1.1\tkv"..., 4096) = 261 13486 read(7, "", 4096) = 0 13486 close(7) = 0 13486 munmap(0x7fa6812b9000, 4096) = 0 13486 stat("/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=192, ...}) = 0 13486 socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 7 13486 connect(7, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.10.20.1")}, 16) = 0 13486 poll([{fd=7, events=POLLOUT}], 1, 0) = 1 ([{fd=7, revents=POLLOUT}]) 13486 sendto(7, "j\312\1\0\0\1\0\0\0\0\0\0\3stunnel\vmydomain\3com"..., 37, MSG_NOSIGNAL, NULL, 0) = 37 13486 poll([{fd=7, events=POLLIN}], 1, 5000) = 0 (Timeout) 13486 socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 8 13486 connect(8, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("8.8.8.8")}, 16) = 0 13486 poll([{fd=8, events=POLLOUT}], 1, 0) = 1 ([{fd=8, revents=POLLOUT}]) 13486 sendto(8, "j\312\1\0\0\1\0\0\0\0\0\0\3stunnel\vmydomain\3com"..., 37, MSG_NOSIGNAL, NULL, 0) = 37 13486 poll([{fd=8, events=POLLIN}], 1, 5000) = 0 (Timeout) 13486 poll([{fd=7, events=POLLOUT}], 1, 0) = 1 ([{fd=7, revents=POLLOUT}]) 13486 sendto(7, "j\312\1\0\0\1\0\0\0\0\0\0\3stunnel\vmydomain\3com"..., 37, MSG_NOSIGNAL, NULL, 0) = 37 13486 poll([{fd=7, events=POLLIN}], 1, 5000) = ? ERESTART_RESTARTBLOCK (To be restarted) 13476 <... wait4 resumed> 0x7fffad2b832c, 0, NULL) = ? ERESTARTSYS (To be restarted) 13486 --- SIGINT (Interrupt) @ 0 (0) --- 13476 --- SIGINT (Interrupt) @ 0 (0) --- 13476 rt_sigreturn(0x2) = -1 EINTR (Interrupted system call) 13476 wait4(-1, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGINT}], 0, NULL) = 13486 13476 --- SIGCHLD (Child exited) @ 0 (0) --- 13476 rt_sigreturn(0x11) = 13486 13476 rt_sigprocmask(SIG_SETMASK, [], [], 8) = 0 13476 rt_sigaction(SIGINT, {SIG_DFL, [INT], SA_RESTORER|SA_RESTART, 0x7f4ea78fb4c0}, {0x40f100, ~[KILL STOP RTMIN RT_1], SA_RESTORER, 0x7f4ea78fb4c0}, 8) = 0 13476 tgkill(13476, 13476, SIGINT) = 0 13476 --- SIGINT (Interrupt) @ 0 (0) --- 13476 +++ killed by SIGINT +++ [...]
On Thu, Sep 20, 2012 at 5:56 PM, Michal Trojnara Michal.Trojnara@mirt.net wrote:
Muschi Stevenson wrote:
Doing an strace as suggested by symcbean shows the following
The suggestion was good. Unfortunately the sample you collected only contains execution flow of your shell while processing /etc/init.d/stunnel4 (or equivalent) startup script. To also collect the trace of stunnel execution use "-f" option of strace.
Mike _______________________________________________ stunnel-users mailing list stunnel-users@stunnel.org https://www.stunnel.org/cgi-bin/mailman/listinfo/stunnel-users
On Monday, 24 of September 2012, Muschi Stevenson wrote:
I left the strace running for roughly 2 hours and the machine still didn't boot.
It looks like getaddrinfo() library function returns EAI_AGAIN intead of a permanent error on your system. Can you please send us your configuration file (I expect the error may depend on the order of options) and the output of ltrace (the usage is similar to strace)?
Mike
########################################################### # The config file: ########################################################### ; change the UID and GID of the process for security reasons setuid = stunnel4 setgid = stunnel4
; The PID file pid = /var/lib/stunnel4/stunnel.pid
; Certificate CAfile=/somewhere/ca.crt cert=/somewhere/cert.pem verify=2
; enable client mode client = yes
; Debug debug = 3 output = /somewhere/stunnel.log
; Configure our services
[someservice] accept=12121 connect=stunnel.mydomain.com:12121 delay=NO
;seconds to wait for expected data TIMEOUTbusy=10 ;seconds to wait for close_notify (set to 0 for buggy MSIE) TIMEOUTclose=10 ;seconds to connect remote host (10 seconds) TIMEOUTconnect=10 ;seconds to keep an idle connection (1 hour) TIMEOUTidle=3600
###########################################################
########################################################### # The ltrace relevant part ########################################################### 11827 memcpy(0x7fff220a6110, "connect=stunnel.mydomain.com:12121"..., 16384) = 0x7fff220a6110 11827 __ctype_b_loc() = 0x7f1c27d7c6b8 11827 strlen("connect=stunnel.mydomain.com:12121"...) = 33 11827 strchr("connect=stunnel.mydomain.com:12121"..., '=') = "=stunnel.mydomain.com:12121" 11827 strlen("connect") = 7 11827 strcasecmp("connect", "accept") = 2 11827 strcasecmp("connect", "CApath") = 14 11827 strcasecmp("connect", "CAfile") = 14 11827 strcasecmp("connect", "cert") = 10 11827 strcasecmp("connect", "ciphers") = 6 11827 strcasecmp("connect", "client") = 3 11827 strcasecmp("connect", "connect") = 0 11827 strlen("stunnel.mydomain.com:12121") = 25 11827 pthread_getspecific(0, 0x415374, 80, 24, 0) = 0x2614820 11827 calloc(1, 84) = 0x02618870 11827 strcpy(0x026188a0, "stunnel.mydomain.com:12121") = 0x026188a0 11827 strlen("stunnel.mydomain.com:12121") = 25 11827 pthread_getspecific(0, 0x415374, 80, 24, 88) = 0x2614820 11827 calloc(1, 84) = 0x026188d0 11827 strcpy(0x02618900, "stunnel.mydomain.com:12121") = 0x02618900 11827 strrchr("stunnel.mydomain.com:12121", ':') = ":12121" 11827 getaddrinfo("stunnel.mydomain.com", "12121", 0x7fff220a2060, 0x7fff220a2058 <unfinished ...> 11827 --- SIGINT (Interrupt) --- 11827 +++ killed by SIGINT +++
###########################################################
On Mon, Sep 24, 2012 at 5:14 PM, Michal Trojnara Michal.Trojnara@mirt.net wrote:
On Monday, 24 of September 2012, Muschi Stevenson wrote:
I left the strace running for roughly 2 hours and the machine still didn't boot.
It looks like getaddrinfo() library function returns EAI_AGAIN intead of a permanent error on your system. Can you please send us your configuration file (I expect the error may depend on the order of options) and the output of ltrace (the usage is similar to strace)?
Mike
stunnel-users mailing list stunnel-users@stunnel.org https://www.stunnel.org/cgi-bin/mailman/listinfo/stunnel-users
On 2012-09-24 17:53, Muschi Stevenson wrote:
[someservice] accept=12121 connect=stunnel.mydomain.com:12121 delay=NO
This is what I expected. Move "delay" before "connect". Preferably to global options (i.e. before any service sections).
Let me know if this workaround works, so I can fix the root cause.
Mike
That seems to have solved it however I also had to change delay to delay=yes as it still doesn't work with delay=no.
delay=yes is ok for me as from what I could gather this makes a DNS resolution everytime it attempts to start a tunnel. Is that correct?
During testing if the IP was changed for "stunnel.mydomain.com" with delay=no stunnel would never adapt to the new IP, I remember reading that delay=yes would solve this particular case also. Again, is that correct?
Would be helpful if someone could explain this option more in detail as i couldn't take much from the man page http://www.stunnel.org/static/stunnel.html
On Mon, Sep 24, 2012 at 6:05 PM, Michal Trojnara Michal.Trojnara@mirt.net wrote:
On 2012-09-24 17:53, Muschi Stevenson wrote:
[someservice] accept=12121 connect=stunnel.mydomain.com:12121 delay=NO
This is what I expected. Move "delay" before "connect". Preferably to global options (i.e. before any service sections).
Let me know if this workaround works, so I can fix the root cause.
Mike
stunnel-users mailing list stunnel-users@stunnel.org https://www.stunnel.org/cgi-bin/mailman/listinfo/stunnel-users