coredumps and/or CPU eating zombies after dlopen/fork

Thomas Wolff towo@towo.net
Fri Nov 25 15:38:11 GMT 2022



Am 25/11/2022 um 14:22 schrieb Dmitry Karasik:
> URL: http://karasik.eu.org/misc/cygwin/
>
> Dear all,
>
> Here's some exception that is caused if gtk_settings_get_default() is called from a
> dll and then later fork() call is made.  The bug is not observed if the call is
> made in the main program, and neither is observed if the gtk initialization is
> done but gtk_settings_get_default() is not called.
>
> Warning: If you run ./dlload.exe without CYGWIN environment variable being set to
> dumper that will terminate the process, your system will accumulate copies of
> dlload.exe, zombie-like, which will eat CPU. strace says that these zombie
> processes repeatedly hit exceptions in endless loops. The following strace
> is repeated forever after the fork:
>
> --- Process 9108 (pid: 10439), exception c0000005 at 00000003f5baa8e0
>   1960   21097 [main] perl 10439 exception::handle: In cygwin_except_handler exception 0xC0000005 at 0x3F5BAA8E0 sp 0xFFFFC5A8
>     16   21113 [main] perl 10439 exception::handle: In cygwin_except_handler signal 11 at 0x3F5BAA8E0
>     14   21127 [main] perl 10439 try_to_debug: debugger_command 'dumper "./dlload.exe"'
>     23   21150 [main] perl 10439 break_here: break here
>     12   21162 [main] perl 10439 sig_send: sendsig 0x13C, pid 10439, signal 11, its_me 1
>     14   21176 [main] perl 10439 sig_send: wakeup 0x3F4
>     15   21191 [main] perl 10439 sig_send: Waiting for pack.wakeup 0x3F4
>     19   21210 [sig] perl 10439 sigpacket::process: returning -1
>     19   21229 [sig] perl 10439 wait_sig: signalling pack.wakeup 0x3F4
>     17   21246 [main] perl 10439 sig_send: returning 0x0 from sending signal 11
>
> I encountered this problem when I've seen random perl and python scripts hanging (as they were apparently waiting for
> forked child that never ended), and when ^C-d, I notices the accumulation of the zombie processes.
>
> The dumper's coredump doesn't show the culprit, but it does show this:
> (gdb) bt
> #0  0x00007ffa4870d744 in ntdll!ZwDelayExecution () from C:/WINDOWS/SYSTEM32/ntdll.dll
> #1  0x00007ffa4601b03e in SleepEx () from C:/WINDOWS/System32/KERNELBASE.dll
> #2  0x000000018006205a in try_to_debug () from C:/cygwin64/bin/cygwin1.dll
> #3  0x00000001800624f6 in exception::handle(_EXCEPTION_RECORD*, void*, _CONTEXT*, _DISPATCHER_CONTEXT*) () from C:/cygwin64/bin/cygwin1.dll
> #4  0x00007ffa4871241f in ntdll!.chkstk () from C:/WINDOWS/SYSTEM32/ntdll.dll
> #5  0x00007ffa486c14a4 in ntdll!RtlRaiseException () from C:/WINDOWS/SYSTEM32/ntdll.dll
> #6  0x00007ffa48710f4e in ntdll!KiUserExceptionDispatcher () from C:/WINDOWS/SYSTEM32/ntdll.dll
> #7  0x00000003f5baa8e0 in ?? ()
> Backtrace stopped: previous frame inner to this frame (corrupt stack?)
>
> which seems to indicate that the exception is somewhere in cygwin runtime.  I
> haven't got around to finding out where that bug in the runtime is exactly, as
> I'd like to hear if there any smart strategies of doing that.
>
> I neither succeed to reduce the gtk_settings_get_default() to something more
> chewable (that call was actually most reduced), even though I recompiled gtk3
> locally, but its strace strangely doesn't show anything suspicious, no forks,
> no open sockets, no pipe calls, just file openings (see strace.gsettings).
>
> Kindly advise how to proceed if I can help fixing this, so far I'm a bit stuck.
I had trouble with dlopen myself, until I found it cannot be nested if a 
library called uses dlopen itself.
In my case, it helped to add flags RTLD_LAZY | RTLD_GLOBAL to dlopen.

>
> Otherwise, to reproduce, download and unpack http://karasik.eu.org/misc/cygwin/cygwin-gtk-dlopen-fork-bug.tar
> and run ./try there.
>



More information about the Cygwin mailing list