replace the flockfile backend with a per FILE recursive mutex.
the flockfile implementation in thread/rthread_file.c used an
external lock, and associated it with the relevant FILE * as needed.
this isn't great for a lot of reasons, complexity being the big
one, but the straw that broke the camels back is that it uses a
single spinlock to coordinate all of this, which in turn generates
a lot of sched_yield syscalls.
this avoids all the code complexity and the spinlock by just embedding
a small __rctmx in every FILE.
tested by and ok tb@ jca@
ok claudio@