9fans archive / 2008 / 08 / 389 /    prev next

From: cinap_lenrek@gmx.de
Subject: Re: [9fans] notes and traps
Date: Sat, 30 Aug 2008 04:18:50 +0200

> > i can reproduce it with this:
> > 
> > http://cm.bell-labs.com/sources/contrib/cinap_lenrek/traptest/
> > 
> > 8c test.c
> > 8a int80.s
> > 8l test.8 int80.8
> > ./8.out
> > 
> > 8.out 12490667: suicide: sys: trap: general protection violation 
> > pc=0x00001333
> 
> okay.  it seems pretty clear from the code that you're dead meat
> if you receive a note while you're in the note handler.  that is,
> up->notified = 1. 

No! Notes are bufferd in the up->note[] array. If you are in the note handler,
another process *can* send you further (NUser) notes without doing any harm.

If we are in the note handler (up->notified == 1) and notify() gets hit,
it will do nothing and return 0 see:

/sys/src/9/pc/trap.c: notify()
...
	if(n->flag!=NUser && (up->notified || up->notify==0)){
		if(n->flag == NDebug)
			pprint("suicide: %s\n", n->msg);
		qunlock(&up->debug);
		pexit(n->msg, n->flag!=NDebug);
	}

	if(up->notified){
		qunlock(&up->debug);
		splhi();
		return 0;
	}
...

The problem is when we get a NDebug note *after* an NUser note. Then
after notify() poped the first NUser note and putting the process into
the user handler, the NDebug note will be the next/first (up->note[0]) and then,
any (indirect) call to notify() will kill us because now it thinks while handling the last
note (up->notified == 1) it caused some trap/fatal event (up->note[0].flag != NUser).
but this was *not* the case here! We just traped after some other process
put a note in our queue.

The notify() code for detecting trap in note handler is fine i think.
Whats wrong is that the trap got put after the NUser note.

> it looks pretty clear that this is intentional.
> i don't see why one couldn't get 3-4 note before the note handler
> is called, however.
> 
> given this, calling sleep() from the note handler is an especially
> bad idea.
> 
> however, on a multiprocessor (or if you get scheduled by a clock
> tick on a up), you're still vulnerable.  this is akin to hitting ^c
> twice quickly — and watching one's shell exit.
> 
> it would be good to track down what's really going on in your
> vm.  how many processors does plan 9 think it has?

just one :-)

> i did some looking to see if i could find any discussions on the
> implementation of notes and didn't find anything in my quick scan.
> it would be very interesting to have a little perspective from someone
> who was there.

I have done further experiments and changed postnote() in
/sys/src/9/port/proc.c from:
...
	if(flag != NUser && (p->notify == 0 || p->notified))
		p->nnote = 0;
...
to:
...
	if(flag != NUser)
		p->nnote = 0;
...
which lets the testcase run without any suicides.

What it does is to ensure (in a harsh way) that not only
if the destination process is currently inside
the notehandler but always, the trap will end up as the first
entry in the up->note array. so no matter what NUser-notes
we received before.

A trap caused by a note handler will still suicide the
process which is correct.

This is just a hack. It would be better to keep the
other notes and move the tail one step down and then
putting the new note on the first entry if its != NUser.

What do you think?

> - erik

--
cinap