3 Forgot to clear to the end of screen when restoring a screen buffer.
4 That worked, for some reason, with Take Command but not with normal
5 consoles. I don't remember why I didn't resize the screen like a Linux
6 X terminal emulator but that might have made things work a little
7 better. Right now, there is a scroll bar for apps like less or vi and
8 that doesn't feel right.
12 Reorganized _cygtls::signal_debugger to avoid sending anything to the
13 debugger if we've seen an exception. I think it used to work that way
14 and I changed it without noting why. It sure seems like, if we don't do
15 this, gdb will see two signals and, it really does, when there has been
16 a Windows-recognized exception.
20 Wow. It's hard getting the screen handling stuff working correctly when
21 there is a screen buffer larger than screen size and vice versa. These
22 changes attempt to use SetConsoleWindowInfo whenever possible so that
23 the contents of the screen buffer are never wiped out. They also fix
24 some previously misbehaving "scroll the screen" commands.
28 Given the fact that the signal thread never exits there is no need
29 for exit_thread to ever block. So, nuke this code.
33 While researching the lftp behavior reported here:
35 http://cygwin.com/ml/cygwin/2013-01/msg00390.html
37 after a frenzy of rewriting sigflush handling to avoid blocking in the
38 signal thread (which is now and should ever have been illegal), it
39 dawned on me that we're not supposed to be flushing the tty input buffer
40 every time a signal is received. We're supposed to do this only when
41 the user hits a character (e.g., CTRL-C) which initiates a signal
42 action. So, I removed sigflush from sigpacket::process and moved it to
43 tc ()->kill_pgrp (). This function should only be called to send
44 signals related to the tty so this should have the desired effect.
48 Apparently I got the signal handling semantics of select() wrong again
49 even though I would have sworn that I tested this on Linux and Windows.
51 select() is apparently *always* interrupted by a signal and *never*
52 restarts. Hopefully, between the comment added to the code and this
53 note, I'll not make this mistake again.
57 (This entry should have been checked in with the changes but... I forgot)
59 This is a fairly big revamp of the way that windows signals are handled.
60 The intent is that all signal decisions should be made by the signal
61 thread; not by the exception handler.
63 This required the ability to pass information from the exception handler
64 to the signal thread so, a si_cyg field was added to siginfo_t. This
65 contains information needed to generate a "core dump". Hmm. Haven't
66 checked to see if this breaks Cygwin's hardly-ever-used real core dump
69 Anyway, I moved signal_exit back into exceptions.cc and removed it from
70 the sigpacket class. This function is now treated like a signal handler
71 function - Cygwin will cause it to be dispatched in the context of
72 whatever thread caught the signal. signal_exit also makes the
73 determination about when to write a stackdump.
75 The signal-handler thread will no longer ever attempt to exit. It will
76 just keep processing signals (it will not process real signals after
77 Cygwin stops shutting down, however). This should make it impossible
78 for the signal thread to ever block waiting for the process lock since
79 it now never grabs the process lock. The signal-handler thread will
80 notify gdb when it gets a signal now but, in theory, gdb should see the
81 context of the thread which received the signal, not the signal-handler
86 (I forgot to mention that cgf-000018 was reverted. Although I never saw
87 a hang from this, I couldn't convince myself that one wasn't possible.)
89 This fix attempts to correct a deadlock where, when a true Windows
90 signal arrives, Windows creates a thread which "does stuff" and attempts
91 to exit. In the process of exiting Cygwin grabs the process lock. If
92 the signal thread has seen the signal and wants to exit, it can't
93 because the newly-created thread now holds it. But, since the new
94 thread is relying on the signal thread to release its process lock,
95 it exits and the process lock is never released.
97 To fix this, I removed calls to _cygtls::signal_exit in favor of
98 flagging that we were exiting by setting signal_exit_code (almost forgot
99 to mark that NO_COPY: that would have been fun). The new function
100 setup_signal_exit() now handles setting things up so that ReadFile loop
101 in wait_sig will do the right thing when it terminates. This function
102 may just Sleep indefinitely if a signal is being sent from a thread
103 other than the signal thread. wait_sig() was changed so that it will
104 essentially drop into asychronous-read-mode when a signal which exits
105 has been detected. The ReadFile loop is exited when we know that the
106 process is supposed to be exiting and there is nothing else in the
109 Although I never actually saw this happen, exit_thread() was also
110 changed to release the process lock and just sleep indefintely if it is
111 detected that we are exiting.
113 2012-12-21 cgf-000018
117 It occurred to me that just getting the process lock during
118 DLL_THREAD_DETACH in dll_entry() might be adequate to fix this
119 problem. It's certainly much less intrusive.
121 There are potential deadlock problems with grabbing a lock in
122 this code, though, so this check-in will be experimental.
124 2012-12-21 cgf-000017
126 The changes in this set are to work around the issue noted here:
128 http://cygwin.com/ml/cygwin/2012-12/threads.html#00140
130 The problem is, apparently, that the return value of an ExitThread()
131 will take precedence over the return value of TerminateProcess/ExitProcess
132 if the thread is the last one exiting. That's rather amazing...
134 For the fix, I replaced all calls to ExitThread with exit_thread(). The
135 exit_thread function, creates a handle to the current thread and sends
136 it to a packet via sig_send(__SIGTHREADEXIT). Then it acquires the
137 process lock and calls ExitThread.
139 wait_sig will then wait for the handle, indicating that the thread has
140 exited, and, when that has happened, removes the process lock on behalf
141 of the now-defunct thread. wait_sig will now also avoid actually
142 exiting since it could trigger the same problem.
144 Holding process_lock should prevent threads from exiting while a Cygwin
145 process is shutting down. They will just block forever in that case -
148 2012-08-17 cgf-000016
150 While debugging another problem I finally noticed that
151 sigpacket::process was unconditionally calling tls->set_siginfo prior to
152 calling setup_handler even though setup_handler could fail. In the
153 event of two successive signals, that would cause the second signal's
154 info to overwrite the first even though the signal handler for the first
155 would eventually be called. Doh.
157 Fixing this required passing the sigpacket si field into setup_handler.
158 Making setup_handler part of the sigpacket class seemed to make a lot of
159 sense so that's what I did. Then I passed the si element into
160 interrupt_setup so that the infodata structure could be filled out prior
161 to arming the signal.
163 The other changes checked in here eliminate the ResetEvent for
164 signal_arrived since previous changes to cygwait should handle the
165 case of spurious signal_arrived detection. Since signal_arrived is
166 not a manual-reset event, we really should just let the appropriate
167 WFMO handle it. Otherwise, there is a race where a signal comes in
168 a "split second" after WFMO responds to some other event. Resetting
169 the signal_arrived would cause any subsequent WFMO to never be
170 triggered. My current theory is that this is what is causing:
172 http://cygwin.com/ml/cygwin/2012-08/msg00310.html
174 2012-08-15 cgf-000015
176 RIP cancelable_wait. Yay.
178 2012-08-09 cgf-000014
180 So, apparently I got it somewhat right before wrt signal handling.
181 Checking on linux, it appears that signals will be sent to a thread
182 which can accept the signal. So resurrecting and extending the
183 "find_tls" function is in order. This function will return the tls
184 of any thread which 1) is waiting for a signal with sigwait*() or
185 2) has the signal unmasked.
187 In redoing this it became obvious that I had the class designation wrong
188 for the threadlist handling so I moved the manipulation of the global
189 threadlist into the cygheap where it logically belongs.
191 2012-07-21 cgf-000013
193 These changes reflect a revamp of the "wait for signal" functionality
194 which has existed in Cygwin through several signal massages.
196 We now create a signal event only when a thread is waiting for a signal
197 and arm it only for that thread. The "set_signal_arrived" function is
198 used to establish the event and set it in a location referencable by
201 I still do not handle all of the race conditions. What happens when
202 a signal comes in just after a WF?O succeeds for some other purpose? I
203 suspect that it will arm the next WF?O call and the subsequent call to
204 call_signal_handler could cause a function to get an EINTR when possibly
207 I haven't yet checked all of the test cases for the URL listed in the
212 2012-06-12 cgf-000012
214 These changes are the preliminary for redoing the way threads wait for
215 signals. The problems are shown by the test case mentioned here:
217 http://cygwin.com/ml/cygwin/2012-05/msg00434.html
219 I've known that the signal handling in threads wasn't quite right for
220 some time. I lost all of my thread signal tests in the great "rm -r"
221 debacle of a few years ago and have been less than enthusiastic about
222 redoing everything (I had PCTS tests and everything). But it really is
223 time to redo this signal handling to make it more like it is supposed to
226 This change should not introduce any new behavior. Things should
227 continue to behave as before. The major differences are a change in the
228 arguments to cancelable_wait and cygwait now uses cancelable_wait and,
229 so, the returns from cygwait now mirror cancelable_wait.
231 The next change will consolidate cygwait and cancelable_wait into one
234 2012-06-02 cgf-000011
236 The refcnt handling was tricky to get right but I had convinced myself
237 that the refcnt's were always incremented/decremented under a lock.
238 Corinna's 2012-05-23 change to refcnt exposed a potential problem with
239 dup handling where the fdtab could be updated while not locked.
241 That should be fixed by this change but, on closer examination, it seems
242 like there are many places where it is possible for the refcnt to be
243 updated while the fdtab is not locked since the default for
244 cygheap_fdget is to not lock the fdtab (and that should be the default -
245 you can't have read holding a lock).
247 Since refcnt was only ever called with 1 or -1, I broke it up into two
248 functions but kept the Interlocked* operation. Incrementing a variable
249 should not be as racy as adding an arbitrary number to it but we have
250 InterlockedIncrement/InterlockedDecrement for a reason so I kept the
251 Interlocked operation here.
253 In the meantime, I'll be mulling over whether the refcnt operations are
254 actually safe as they are. Maybe just ensuring that they are atomically
255 updated is enough since they control the destruction of an fh. If I got
256 the ordering right with incrementing and decrementing then that should
259 2012-06-02 cgf-000010
262 - Fix emacs problem which exposed an issue with Cygwin's select() function.
263 If a signal arrives while select is blocking and the program longjmps
264 out of the signal handler then threads and memory may be left hanging.
265 Fixes: http://cygwin.com/ml/cygwin/2012-05/threads.html#00275
268 This was try #4 or #5 to get select() signal handling working right.
269 It's still not there but it should now at least not leak memory or
272 I mucked with the interface between cygwin_select and select_stuff::wait
273 so that the "new" loop in select_stuff::wait() was essentially moved
274 into the caller. cygwin_select now uses various enum states to decide
275 what to do. It builds the select linked list at the beginning of the
276 loop, allowing wait() to tear everything down and restart. This is
277 necessary before calling a signal handler because the signal handler may
280 I initially had this all coded up to use a special signal_cleanup
281 callback which could be called when a longjmp is called in a signal
282 handler. And cygwin_select() set up and tore down this callback. Once
283 I got everything compiling it, of course, dawned on me that just because
284 you call a longjmp in a signal handler it doesn't mean that you are
285 jumping *out* of the signal handler. So, if the signal handler invokes
286 the callback and returns it will be very bad for select(). Hence, this
287 slower, but hopefully more correct implementation.
289 (I still wonder if some sort of signal cleanup callback might still
290 be useful in the future)
292 TODO: I need to do an audit of other places where this problem could be
295 As alluded to above, select's signal handling is still not right. It
296 still acts as if it could call a signal handler from something other
297 than the main thread but, AFAICT, from my STC, this doesn't seem to be
298 the case. It might be worthwhile to extend cygwait to just magically
299 figure this out and not even bother using w4[0] for scenarios like this.
301 2012-05-16 cgf-000009
304 - Fix broken console mouse handling. Reported here:
305 http://cygwin.com/ml/cygwin/2012-05/msg00360.html
308 I did a cvs annotate on smallprint.cc and see that the code to translate
309 %characters > 127 to 0x notation was in the 1.1 revision. Then I
310 checked the smallprint.c predecessor. It was in the 1.1 version of that
311 program too, which means that this odd change has probably been around
314 Since __small_sprintf is supposed to emulate sprintf, I got rid of the
315 special case handling. This may affect fhandler_socket::bind. If so, we
316 should work around this problem there rather than keeping this strange
317 hack in __small_printf.
319 2012-05-14 cgf-000008
322 - Fix hang when zero bytes are written to a pty using
323 Windows WriteFile or equivalent. Fixes:
324 http://cygwin.com/ml/cygwin/2012-05/msg00323.html
327 cgf-000002, as usual, fixed one thing while breaking another. See
328 Larry's predicament in: http://goo.gl/oGEr2 .
330 The problem is that zero byte writes to the pty pipe caused the dread
331 end-of-the-world-as-we-know-it problem reported on the mailing list
332 where ReadFile reads zero bytes even though there is still more to read
333 on the pipe. This is because that change caused a 'record' to be read
334 and a record can be zero bytes.
336 I was never really keen about using a throwaway buffer just to get a
337 count of the number of characters available to be read in the pty pipe.
338 On closer reading of the documentation for PeekNamedPipe it seemed like
339 the sixth argument to PeekNamedPipe should return what I needed without
340 using a buffer. And, amazingly, it did, except that the problem still
341 remained - a zero byte message still screwed things up.
343 So, we now detect the case where there is zero bytes available as a
344 message but there are bytes available in the pipe. In that scenario,
345 return the bytes available in the pipe rather than the message length of
346 zero. This could conceivably cause problems with pty pipe handling in
347 this scenario but since the only way this scenario could possibly happen
348 is when someone is writing zero bytes using WriteFile to a pty pipe, I'm
351 2012-05-14 cgf-000007
354 - Fix invocation of strace from a cygwin process. Fixes:
355 http://cygwin.com/ml/cygwin/2012-05/msg00292.html
358 The change in cgf-000004 introduced a problem for processes which load
359 cygwin1.dll dynamically. strace.exe is the most prominent example of
362 Since the parent handle is now closed for "non-Cygwin" processes, when
363 strace.exe tried to dynamically load cygwin1.dll, the handle was invalid
364 and child_info_spawn::handle_spawn couldn't use retrieve information
365 from the parent. This eventually led to a strace_printf error due to an
366 attempt to dereference an unavailable cygheap. Probably have to fix
367 this someday. You shouldn't use the cygheap while attempting to print
368 an error about the inavailability of said cygheap.
370 This was fixed by saving the parent pid in child_info_spawn and calling
371 OpenProcess for the parent pid and using that handle iff a process is
374 2012-05-12 cgf-000006
377 - Fix hang when calling pthread_testcancel in a canceled thread.
378 Fixes some of: http://cygwin.com/ml/cygwin/2012-05/msg00186.html
381 This should fix the first part of the reported problem in the above
382 message. The cancel seemed to actually be working but, the fprintf
383 eventually ended up calling pthread_testcancel. Since we'd gotten here
384 via a cancel, it tried to recursively call the cancel handler causing a
387 2012-05-12 cgf-000005
390 - Fix pipe creation problem which manifested as a problem creating a
391 fifo. Fixes: http://cygwin.com/ml/cygwin/2012-05/msg00253.html
394 My change on 2012-04-28 introduced a problem with fifos. The passed
395 in name was overwritten. This was because I wasn't properly keeping
396 track of the length of the generated pipe name when there was a
397 name passed in to fhandler_pipe::create.
399 There was also another problem in fhandler_pipe::create. Since fifos
400 use PIPE_ACCESS_DUPLEX and PIPE_ACCESS_DUPLEX is an or'ing of
401 PIPE_ACCESS_INBOUND and PIPE_ACCESS_OUTBOUND, using PIPE_ACCESS_OUTBOUND
402 as a "never-used" option for PIPE_ADD_PID in fhandler.h was wrong. So,
403 fifo creation attempted to add the pid of a pipe to the name which is
406 2012-05-08 cgf-000004
408 The change for cgf-000003 introduced a new problem:
409 http://cygwin.com/ml/cygwin/2012-05/msg00154.html
410 http://cygwin.com/ml/cygwin/2012-05/msg00157.html
412 Since a handle associated with the parent is no longer being duplicated
413 into a non-cygwin "execed child", Windows is free to reuse the pid of
414 the parent when the parent exits. However, since we *did* duplicate a
415 handle pointing to the pid's shared memory area into the "execed child",
416 the shared memory for the pid was still active.
418 Since the shared memory was still available, if a new process reuses the
419 previous pid, Cygwin would detect that the shared memory was not created
420 and had a "PID_REAPED" flag. That was considered an error, and, so, it
421 would set procinfo to NULL and pinfo::thisproc would die since this
422 situation is not supposed to occur.
424 I fixed this in two ways:
426 1) If a shared memory region has a PID_REAPED flag then zero it and
427 reuse it. This should be safe since you are not really supposed to be
428 querying the shared memory region for anything after PID_REAPED has been
431 2) Forego duping a copy of myself_pinfo if we're starting a non-cygwin
434 It seems like 2) is a common theme and an audit of all of the handles
435 that are being passed to non-cygwin children is in order for 1.7.16.
437 The other minor modification that was made in this change was to add the
438 pid of the failing process to fork error output. This helps slightly
439 when looking at strace output, even though in this case it was easy to
440 find what was failing by looking for '^---' when running the "stv"
441 strace dumper. That found the offending exception quickly.
443 2012-05-07 cgf-000003
446 Don't make Cygwin wait for all children of a non-cygwin child program.
447 Fixes: http://cygwin.com/ml/cygwin/2012-05/msg00063.html,
448 http://cygwin.com/ml/cygwin/2012-05/msg00075.html
451 This problem is due to a recent change which added some robustness and
452 speed to Cygwin's exec/spawn handling by not trying to force inheritance
453 every time a process is started. See ChangeLog entries starting on
454 2012-03-20, and multiple on 2012-03-21.
456 Making the handle inheritable meant that, as usual, there were problems
457 with non-Cygwin processes. When Cygwin "execs" a non-Cygwin process N,
458 all of its N + 1, N + 2, ... children will also inherit the handle.
459 That means that Cygwin will wait until all subprocesses have exited
462 I was willing to make this a restriction of starting non-Cygwin
463 processes but the problem with allowing that is that it can cause the
464 creation of a "limbo" pid when N exits and N + 1 and friends are still
465 around. In this scenario, Cygwin dutifully notices that process N has
466 died and sets the exit code to indicate that but N's parent will wait on
467 rd_proc_pipe and will only return when every N + ... windows process
470 The removal of cygheap::pid_handle was not related to the initial
471 problem that I set out to fix. The change came from the realization
472 that we were duping the current process handle into the child twice and
473 only needed to do it once. The current process handle is used by exec
474 to keep the Windows pid "alive" so that it will not be reused. So, now
475 we just close parent in child_info_spawn::handle_spawn iff we're not
478 In debugging this it bothered me that 'ps' identified a nonactive pid as
479 active. Part of the reason for this was the 'parent' handle in
480 child_info was opened in non-Cygwin processes, keeping the pid alive.
481 That has been kluged around (more changes after 1.7.15) but that didn't
482 fix the problem. On further investigation, this seems to be caused by
483 the fact that the shared memory region pid handles were still being
484 passed to non-cygwin children, keeping the pid alive in a limbo-like
485 fashion. This was easily fixed by having pinfo::init() consider a
486 memory region with PID_REAPED as not available. A more robust fix
487 should be considered for 1.7.15+ where these handles are not passed
488 to non-cygwin processes.
490 This fixed the problem where a pid showed up in the list after a user
491 does something like: "bash$ cmd /c start notepad" but, for some reason,
492 it does not fix the problem where "bash$ setsid cmd /c start notepad".
493 That bears investigation after 1.7.15 is released but it is not a
494 regression and so is not a blocker for the release.
496 2012-05-03 cgf-000002
499 Fix problem where too much input was attempted to be read from a
500 pty slave. Fixes: http://cygwin.com/ml/cygwin/2012-05/msg00049.html
503 My change on 2012/04/05 reintroduced the problem first described by:
504 http://cygwin.com/ml/cygwin/2011-10/threads.html#00445
506 The problem then was, IIRC, due to the fact that bytes sent to the pty
507 pipe were not written as records. Changing pipe to PIPE_TYPE_MESSAGE in
508 pipe.cc fixed the problem since writing lines to one side of the pipe
509 caused exactly that the number of characters to be read on the other
510 even if there were more characters in the pipe.
512 To debug this, I first replaced fhandler_tty.cc with the 1.258,
513 2012/04/05 version. The test case started working when I did that.
515 So, then, I replaced individual functions, one at a time, in
516 fhandler_tty.cc with their previous versions. I'd expected this to be a
517 problem with fhandler_pty_master::process_slave_output since that had
518 seen the most changes but was surprised to see that the culprit was
519 fhandler_pty_slave::read().
521 The reason was that I really needed the bytes_available() function to
522 return the number of bytes which would be read in the next operation
523 rather than the number of bytes available in the pipe. That's because
524 there may be a number of lines available to be read but the number of
525 bytes which will be read by ReadFile should reflect the mode of the pty
526 and, if there is a line to read, only the number of bytes in the line
527 should be seen as available for the next read.
529 Having bytes_available() return the number of bytes which would be read
530 seemed to fix the problem but it could subtly change the behavior of
531 other callers of this function. However, I actually think this is
532 probably a good thing since they probably should have been seeing the
535 2012-05-02 cgf-000001
538 Fix problem setting parent pid to 1 when process with children execs
539 itself. Fixes: http://cygwin.com/ml/cygwin/2012-05/msg00009.html
542 Investigating this problem with strace showed that ssh-agent was
543 checking the parent pid and getting a 1 when it shouldn't have. Other
544 stuff looked ok so I chose to consider this a smoking gun.
546 Going back to the version that the OP said did not have the problem, I
547 worked forward until I found where the problem first occurred -
548 somewhere around 2012-03-19. And, indeed, the getppid call returned the
549 correct value in the working version. That means that this stopped
550 working when I redid the way the process pipe was inherited around
553 It isn't clear why (and I suspect I may have to debug this further at
554 some point) this hasn't always been a problem but I made the obvious fix.
555 We shouldn't have been setting ppid = 1 when we're about to pass off to
558 As I was writing this, I realized that it was necessary to add some
559 additional checks. Just checking for "have_execed" isn't enough. If
560 we've execed a non-cygwin process then it won't know how to deal with
561 any inherited children. So, always set ppid = 1 if we've execed a