I am now trying to making progress on the MIT 6.828 (2018) course on Operating Systems Engineering, and I like it a lot. It is fun and challenging. I learned a lot of basic OS knowledge from this. Now I am struggling with this fine-grained locking challenge: https://pdos.csail.mit.edu/6.828/2018/labs/lab4/
But when I try to run: make run-primes-nox CPUS=4
, I got failed when forking the child, I suspect it is the kernel stack data got corrupted or replaced during scheduling.
The parent sometimes won't recover from the fork
system call
in scheduler, before making a round, acquire some lock(lock_scheduler();
) to prevent other CPUs from accessing the process list.
int i = 1, curpos = -1, k = 0;
if (curenv)
curpos = ENVX(curenv->env_id);
lock_scheduler();
for (; i < NENV; i++)
{
k = (i + curpos) % NENV; // in a circular way
if (envs[k].env_status == ENV_RUNNABLE)
{
env_run(&envs[k]);
}
}
if (curenv != NULL && curenv->env_status == ENV_RUNNING)
{
env_run(curenv);
}
// sched_halt never returns
sched_halt();
during sched_halt
or about to env_run
, we release the lock.
if (kernel_lock.locked && kernel_lock.cpu == thiscpu)
unlock_kernel();
if (scheduler_lock.locked && scheduler_lock.cpu == thiscpu)
unlock_scheduler();
when trapped into the kernel from interrupts or system call(explicitly with int $0x30
), we lock the kernel with original big kernel lock(BKL), and before exiting the trap e.g. by env_run
, we release the kernel lock.
void
trap(struct Trapframe *tf)
{
// The environment may have set DF and some versions
// of GCC rely on DF being clear
asm volatile("cld" ::: "cc");
// Halt the CPU if some other CPU has called panic()
extern char *panicstr;
if (panicstr)
asm volatile("hlt");
// Re-acqurie the big kernel lock if we were halted in
// sched_yield()
xchg(&thiscpu->cpu_status, CPU_STARTED);
// Check that interrupts are disabled. If this assertion
// fails, DO NOT be tempted to fix it by inserting a "cli" in
// the interrupt path.
assert(!(read_eflags() & FL_IF));
// only apply in trap
lock_kernel();
......
Currently:
- I keep the kernel_lock when trapped into the kernel from user space
- I use the page_lock to protect the page_free_list when allocating or deallocating the memory
- I acquire the scheduler_lock when getting into the
sched_yield
method, unlock it just before running any user process (env_pop_tf
)
Sorry the information might not very sufficient, I have uploaded my workspace on Github here:
https://github.com/k0Iry/6.828_2018_mit_jos
here contains all my implementation from lab1 till lab4. Thanks for reviewing!
Way to reproduce that issue:
git clone https://github.com/k0Iry/6.828_2018_mit_jos.git && cd 6.828_2018_mit_jos
wget https://raw.githubusercontent.com/k0Iry/xv6-jos-i386-lab/master/labs/0001-trying-with-fine-grained-locks.patch
git apply 0001-trying-with-fine-grained-locks.patch
make run-primes-nox CPUS=4
you got the error during the processes' forking