9fans archive / 1997 / 04 / 69 / prev next
From: presotto@pla... presotto@pla...
Subject: Pentium Pro and coherence
Date: Mon, 21 Apr 1997 10:33:10 -0400
Sorry for yet another long message...
→ From: hamnavoe.demon.co.uk!miller
To: cse.psu.edu!9fans
Subject: Re: porting linux programs and drivers to plan9
presotto@pla... writes:
> [a fascinating account of how the Pentium Pro's out-of-order
> instruction execution breaks the Plan 9 sleep/wakeup code on
> a multi-CPU system]
I didn't write those words. I may have written what
accompanied them but not having seen the message, I don't
know.
The exact ordering I gave in my last mail was impossible
because of the locks. An equally illustrative
(and this time actually possible) version follows.
wakeup_condition = 1;
p = u->p;
lock(&p->rlock);
r->p = p; /* put myself in the rendezvous structure */
A: if(wakeup_condition){
r->p = 0; /* no need to sleep */
unlock(&p->rlock);
return;
} else {
/* go to sleep */
p->state = Wakeme;
p->r = r;
unlock(&p->rlock);
p = r->p;
B: if(p == 0)
return;
lock(&p->rlock);
if(r->p == p && p->r == r){
r->p = 0;
p->r = 0;
ready(p);
}
unlock(&p->rlock);
sched();
}
The ordering of the critical instructions is the same but at least
this time I got the ordering of the locked pieces right. The critical
points are A and B. With speculative reads, both r->p and
wakeup_condition may appear to be 0 (depending on what lock()
does or doesn't do).
→ It appears that the slightly different version of sleep/wakeup
given in the Volume 2 paper `Process Sleep and Wakeup on a
Shared-memory Multiprocessor' should be immune to the effects
of weak memory coherency, because the shared variables are
referenced only inside a lock/unlock pair. Is this right?
I'm not sure. It depends a bit on what we believe fixes
the coherence. We don't really know what's happening inside the
pro, we're just guessing. We're not even certain that speculative
reads are the problem. The Pro people have remained silent
on the subject (we've sent email).
Assuming that it was indeed speculative reads, the simplest mechanism
that I can posit Intel to have provided was to have speculative
reads canceled whenever an interlocking instruction is encountered.
If this is indeed the case, then leaving everything between locks
wouldbe sufficient.
( Unfortunately, we don't do that
because of the interaction between postnote and sleep/wakeup. Postnote
doesn't know what r is without first looking at p->r outside of any
possible lock. We could fix sleep/wakeup by moving the problem so
to be between sleep and postnote. However, it'ld be the same
problem. This is perhaps another story. )
Of course, I could be totally wrong about the speculative reads and
it may be the interlock instruction on the writer and not the
reader that causes the processors to become coherent. In that case, at the
very least, we'ld have to make unlock() end with an interlocking
instruction. The released version just sets 'l->val = 0'.
We have discovered empiricly that performing an interlock instruction
between setting one shared variable and looking at the other seems
sufficient. Nothing less seemed to work for us. Putting everything
back inside the locks might have worked but we didn't because of
postnote().
Since we're paranoids, we now perform an interlocking instruction
before checking the state variables in sleep() and wakeup() AND
at the end of unlock(). Everywhere else, we seem to be following
a strict just change/look at shared things inside of lock/unlock policy.
→ Perhaps the moral is that it's better to be conservative with
locks than to trust hardware designers to do what we expect.
I certainly agree. We are going to encounter more relaxed ordering
in multiprocessors. The question is, what do the hardware
designers consider conservative? Forcing an interlock
at both the beginning and end of a locked section seems to be
pretty conservative to me, but I clearly am not immaginative
enough. The Pro manuals go into excruciating detail in describing
the caches and what keeps them coherent but don't seem to care
to say anything detailed about execution or read ordering. The
truth is that we have no way of knowing whether we're conservative
enough.