From ea4e68b769634d236c623996c9a67e425419e6b7 Mon Sep 17 00:00:00 2001 From: josh Date: Thu, 16 Apr 2009 02:49:37 +0000 Subject: [PATCH] finishing up report git-svn-id: svn://anubis/gvsu@408 45c1a28c-8058-47b2-ae61-ca45b979098e --- cs658/html/report.html | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/cs658/html/report.html b/cs658/html/report.html index be683a8..6b464c2 100644 --- a/cs658/html/report.html +++ b/cs658/html/report.html @@ -206,6 +206,33 @@ In this way, one worker process was spawned per core on the slave node.

+

+ I had originally planned on implementing fault-tolerance in the + distribution architecture by establishing a second TCP connection + from each slave node to the master which served as a polling + connection to make sure that the slaves were still alive. + During implementation, I arrived at a more elegant solution. + I was already keeping track of the set of tasks that were + considered "in progress" as far as the master process was concerned. + If the master process received a request from a slave for a task + to work on, it would normally respond with the next available task + number until all tasks had been given out, and then it would + respond saying that there were no more tasks to work on. + I changed this slightly so that if the master got a request from + a slave for a task to work on, and all of the tasks were already + given out, then the master would respond to the slave with a + task ID from the set of tasks that were currently in progress. + That way, whether the original slave node or the new one finished + the task, the data for it would be collected. + If the original node was dead, then the new slave node would + take over the task and return the data. + If the original node was alive, but just responding very slowly, + then the replacement node could finish the task and return + the results before the original node. + This ended up working very well, as I was able to kill all of + the worker processes on a given slave node and the tasks + that they were working on were finished by other nodes later on. +

Evaluation