diff --git a/cs658/html/report.html b/cs658/html/report.html index f15276e..22f0fac 100644 --- a/cs658/html/report.html +++ b/cs658/html/report.html @@ -114,7 +114,7 @@

To create a scalable parser to read scene files according to a grammar that I supplied, I used the bison parser generator. - I used bison because it is very portable and worked directly + I chose bison because it is very portable and worked directly with my C++ code. The parser produces a hierarchy of Node objects. Each Node object represents some part of the parse tree @@ -123,15 +123,106 @@ in order to evaluate all of the scene objects specified in the scene file, maintaining node order and parent/child relationships.

+

+ I designed the task distribution architecture using a combination + of threads and processes to achieve a working master/slave setup. + When the program starts up, it will determine whether it is running + in distributed mode or not. + If it is not running in distributed mode (no --hosts or --host + argument was supplied), then the scene is rendered one pixel at + a time sequentially. + If, however, a --hosts or --host option is present, then the + program is running in distributed mode. + If a --hosts option is present, then the process is the master + process and it reads the list of hosts to use as slave processes + from the file specified. + If a --host and --port option are provided then the process is + a slave process and it connects to the master process using the + hostname provided on the TCP port provided. +

+

+ In distributed mode, the master process creates a server thread. + This server thread listens on a free TCP port (chosen by the + system) for incoming connections. + Every time that a connection from a slave node is made to this port, + another connection thread is spawned to handle communication from + the master node's side for that slave node. + After the master process starts the listen thread, it also forks + and execs an ssh process as a sub-process. + This ssh process is what connects to the slave node and begins + executing a copy of the program at the slave node, informing + it of the hostname and port of the master node via the + --host and --port command-line options. +

Implementation

+

+ When I first implemented the distribution architecture and did + a test-run, a slave process was created on each of the hosts + that I specified, but only 5-10% of each CPU was being utilized. + I ran "netstat -taunp" and saw that the TCP connections had + data in their send queues. + This meant that the program had already processed what it was + asked to process and was simply waiting for the network. + I realized that the TCP implementation was waiting to accumulate + data before sending it over the TCP connection. + In order to deal with this, I used the setsockopt() + call to set the TCP_NODELAY flag to 1. + This disabled "Nagle's algorithm" in the TCP subsystem for + that socket. + This meant that data would be sent as soon as it was available. + Setting this option on the communication socket immediately + allowed each slave node to begin using pretty much 100% of + one of its cores. +

+

+ My first attempt to utilize each core on the system involved + turning on OpenMP with the compiler flag -fopenmp + and adding omp parallel for directives in the for + loop carrying out the render task. + This attempt failed. + My program would abort with errors coming from the C library. + I used gdb and examined stack dumps and there were pthread + and other C memory management functions which were throwing errors. + I believe this did not work well because I was manually creating + and managing threads with the pthread system in addition to + trying to use OpenMP, which was probably implemented by the + compiler using pthreads. +

+

+ Because OpenMP did not work to utilize every core on a worker + node, I switched to a separate solution. + The program was already computing a list of command-line arguments + in order to pass to slave nodes, so I made the slave nodes record + this list. + If a process was the "master slave process" (the first process + executed on this slave node) then the program would call + n = sysconf(_SC_NPROCESSORS_CONF) to retrieve the total + number of processors on the system. + Then, the program simply did a fork() and execvp() + to execute n-1 copies of itself on the slave node. + In this way, one worker process was spawned per core on the + slave node. +

Evaluation

Future Work

+

+ The current method to utilize each core of the worker nodes involves + the main process doing a fork() and execvp() + n-1 times, where n is the number + of processors detected + on the slave node. + This has the disadvantage that the scene-file is parsed and + initialization is redone for each of these processes. + This time could be saved by an implementation that instead + created another thread with pthread_create() after + parsing the scene file and building the scene. +