9fans archive / 2000 / 07 / 509 /    prev next

From: miller@ham...
Subject: Re: [9fans] Kernel question: i386 test-and-set problem
Date: Thu, 20 Jul 2000 14:54:57 BST

jmk@pla... writes:

> The sleep/wakeup/postnote Rendez structure still has a lock which
> protects it, it just moved somewhere else.

Sorry, I didn't explain in enough detail.  In /sys/src/9/port/proc.c:588
wakeup() looks at r->p (pointer from Rendez to sleeping process)
without first acquiring any lock.  That's the unprotected access I was
referring to: it's dangerous because r->p is shared asynchronously
by sleep() and postnote().

The original 2nd edition kernel (CD version) had a lock in the Rendez
structure, and all accesses to r->p  were protected by acquiring
the lock first.  However, p->r (pointer from sleeping process to Rendez)
was shared between sleep() and postnote() without locking.

A later kernel update (845586056.rc) introduced a new lock in the Proc
structure (p->rlock) to protect the shared access to p->r, but eliminated
the lock in the Rendez structure.  This left r->p exposed again.  I believe
that's why you need coherence() calls.

> The 2nd Edition code would
> have needed coherence() calls too, but in different places, had it not
> been rewritten before we tried running on a multiprocessor Pentium Pro.

When I added mp support to the 2nd edition for my dual ppro system,
I reinstated the Rendez lock, and kept p->rlock as well, so in the
three-way conversation between sleep(), wakeup() and postnote() both
r->p and p->r are protected.  I didn't add any explicit coherence()
calls anywhere, and the system has been running stably for over two years.
If I remove the lock around the r->p access in wakeup(), a few simultaneous
'du -a /' commands will quickly cause a crash.

-- Richard Miller