finishing up report

git-svn-id: svn://anubis/gvsu@408 45c1a28c-8058-47b2-ae61-ca45b979098e
This commit is contained in:
josh 2009-04-16 02:49:37 +00:00
parent fbe0691460
commit ea4e68b769

View File

@ -206,6 +206,33 @@
In this way, one worker process was spawned per core on the
slave node.
</p>
<p>
I had originally planned on implementing fault-tolerance in the
distribution architecture by establishing a second TCP connection
from each slave node to the master which served as a polling
connection to make sure that the slaves were still alive.
During implementation, I arrived at a more elegant solution.
I was already keeping track of the set of tasks that were
considered "in progress" as far as the master process was concerned.
If the master process received a request from a slave for a task
to work on, it would normally respond with the next available task
number until all tasks had been given out, and then it would
respond saying that there were no more tasks to work on.
I changed this slightly so that if the master got a request from
a slave for a task to work on, and all of the tasks were already
given out, then the master would respond to the slave with a
task ID from the set of tasks that were currently in progress.
That way, whether the original slave node or the new one finished
the task, the data for it would be collected.
If the original node was dead, then the new slave node would
take over the task and return the data.
If the original node was alive, but just responding very slowly,
then the replacement node could finish the task and return
the results before the original node.
This ended up working very well, as I was able to kill all of
the worker processes on a given slave node and the tasks
that they were working on were finished by other nodes later on.
</p>
<a name="evaluation" />
<h4>Evaluation</h4>