9fans archive / 1997 / 05 / 58 / prev next From: G. David Butler gdb@dbS... Subject: calling sleep() while holding lock() (fwd) Date: Tue, 20 May 1997 14:14:27 -0500 Sorry for letting this sit, but life happens... Back to our story. >From ivan@ncu... (Eivind Sarto) >> ---------- Forwarded message ---------- >> From: "G. David Butler" <gdb@dbS...> >> >> A summary of the changes: >> > >Wow! That was quick. Thanks. >As I mention in my previous mail, there are some locking violations >in the VM code that only shows up under heavy load. Or not so heavy load.... snip.... >These are caused by uncachepage() being called with one or more locks. >uncachepage() can call putimage() which may close the chan. Yes, you are very right! >The solution I implemented was to make putimage not close the chan. If >it was the last reference to the image, it returns the Chan*, otherwise >it return NULL. uncachepage needs to be slightly recoded and it must >also terurn the Chan* pointer back to its caller. Whoever called >uncachepage can then close the Chan after locks have been released (if >it was the last reference to the image). I agree with your approach. The only place that is a little hard is the end of duppage() in page.c. For the moment I have a panic to guard the double Chan return. If I ever see one I will have to fix it right. A reading of the code says a panic is possible. >These changes prevented me from making radical changes to the page code. >Just some minor changes to putimage, uncachepage and whoever calls them. Are radical changes necessary, at least from this perspective? >I think we fixed a couple of locking violations in the streams code, too. >One place even has a comment about holding spin-lock and sleeping. I don't see that comment. I've been using the patches I sent before to find violations, but haven't seen any there. >I'll be happy to answer any questions. Ok, here is the next big one. What happens when a interrupt handler needs a lock that the non interrupt code holds at spllo()? Can you say "lock loop"? It seems some care needs to be applied to find all spin locks that can be acquired in interrupt handlers and make sure the base code uses ilock to guard the lock with splhi(). Now that I have fixed most of the sleep with spins I can put considerable load on the system and the next thing that breaks is lock loops. This problem exists on all platforms both uni and multi cpu. If an interrupt handler stops the execution of the critical code on a cpu, you can get a lock loop. It is ok for an interrupt to occur on another cpu since the critical locked code will continue and release the lock while the interrupt spins. Before I start this journey, can you lend further pointers? >ivan@ncu... Thanks for the help. David Butler gdb@dbS...