adding to report
git-svn-id: svn://anubis/gvsu@406 45c1a28c-8058-47b2-ae61-ca45b979098e
This commit is contained in:
parent
8c9ab81429
commit
f8b570e06b
@ -114,7 +114,7 @@
|
|||||||
<p>
|
<p>
|
||||||
To create a scalable parser to read scene files according to
|
To create a scalable parser to read scene files according to
|
||||||
a grammar that I supplied, I used the bison parser generator.
|
a grammar that I supplied, I used the bison parser generator.
|
||||||
I used bison because it is very portable and worked directly
|
I chose bison because it is very portable and worked directly
|
||||||
with my C++ code.
|
with my C++ code.
|
||||||
The parser produces a hierarchy of <tt>Node</tt> objects.
|
The parser produces a hierarchy of <tt>Node</tt> objects.
|
||||||
Each <tt>Node</tt> object represents some part of the parse tree
|
Each <tt>Node</tt> object represents some part of the parse tree
|
||||||
@ -123,15 +123,106 @@
|
|||||||
in order to evaluate all of the scene objects specified in the
|
in order to evaluate all of the scene objects specified in the
|
||||||
scene file, maintaining node order and parent/child relationships.
|
scene file, maintaining node order and parent/child relationships.
|
||||||
</p>
|
</p>
|
||||||
|
<p>
|
||||||
|
I designed the task distribution architecture using a combination
|
||||||
|
of threads and processes to achieve a working master/slave setup.
|
||||||
|
When the program starts up, it will determine whether it is running
|
||||||
|
in distributed mode or not.
|
||||||
|
If it is not running in distributed mode (no --hosts or --host
|
||||||
|
argument was supplied), then the scene is rendered one pixel at
|
||||||
|
a time sequentially.
|
||||||
|
If, however, a --hosts or --host option is present, then the
|
||||||
|
program is running in distributed mode.
|
||||||
|
If a --hosts option is present, then the process is the master
|
||||||
|
process and it reads the list of hosts to use as slave processes
|
||||||
|
from the file specified.
|
||||||
|
If a --host and --port option are provided then the process is
|
||||||
|
a slave process and it connects to the master process using the
|
||||||
|
hostname provided on the TCP port provided.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
In distributed mode, the master process creates a server thread.
|
||||||
|
This server thread listens on a free TCP port (chosen by the
|
||||||
|
system) for incoming connections.
|
||||||
|
Every time that a connection from a slave node is made to this port,
|
||||||
|
another connection thread is spawned to handle communication from
|
||||||
|
the master node's side for that slave node.
|
||||||
|
After the master process starts the listen thread, it also forks
|
||||||
|
and execs an ssh process as a sub-process.
|
||||||
|
This ssh process is what connects to the slave node and begins
|
||||||
|
executing a copy of the program at the slave node, informing
|
||||||
|
it of the hostname and port of the master node via the
|
||||||
|
--host and --port command-line options.
|
||||||
|
</p>
|
||||||
|
|
||||||
<a name="implementation" />
|
<a name="implementation" />
|
||||||
<h4>Implementation</h4>
|
<h4>Implementation</h4>
|
||||||
|
<p>
|
||||||
|
When I first implemented the distribution architecture and did
|
||||||
|
a test-run, a slave process was created on each of the hosts
|
||||||
|
that I specified, but only 5-10% of each CPU was being utilized.
|
||||||
|
I ran "netstat -taunp" and saw that the TCP connections had
|
||||||
|
data in their send queues.
|
||||||
|
This meant that the program had already processed what it was
|
||||||
|
asked to process and was simply waiting for the network.
|
||||||
|
I realized that the TCP implementation was waiting to accumulate
|
||||||
|
data before sending it over the TCP connection.
|
||||||
|
In order to deal with this, I used the <tt>setsockopt()</tt>
|
||||||
|
call to set the <tt>TCP_NODELAY</tt> flag to 1.
|
||||||
|
This disabled "Nagle's algorithm" in the TCP subsystem for
|
||||||
|
that socket.
|
||||||
|
This meant that data would be sent as soon as it was available.
|
||||||
|
Setting this option on the communication socket immediately
|
||||||
|
allowed each slave node to begin using pretty much 100% of
|
||||||
|
one of its cores.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
My first attempt to utilize each core on the system involved
|
||||||
|
turning on OpenMP with the compiler flag <tt>-fopenmp</tt>
|
||||||
|
and adding <tt>omp parallel for</tt> directives in the for
|
||||||
|
loop carrying out the render task.
|
||||||
|
This attempt failed.
|
||||||
|
My program would abort with errors coming from the C library.
|
||||||
|
I used gdb and examined stack dumps and there were pthread
|
||||||
|
and other C memory management functions which were throwing errors.
|
||||||
|
I believe this did not work well because I was manually creating
|
||||||
|
and managing threads with the pthread system in addition to
|
||||||
|
trying to use OpenMP, which was probably implemented by the
|
||||||
|
compiler using pthreads.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
Because OpenMP did not work to utilize every core on a worker
|
||||||
|
node, I switched to a separate solution.
|
||||||
|
The program was already computing a list of command-line arguments
|
||||||
|
in order to pass to slave nodes, so I made the slave nodes record
|
||||||
|
this list.
|
||||||
|
If a process was the "master slave process" (the first process
|
||||||
|
executed on this slave node) then the program would call
|
||||||
|
<tt>n = sysconf(_SC_NPROCESSORS_CONF)</tt> to retrieve the total
|
||||||
|
number of processors on the system.
|
||||||
|
Then, the program simply did a <tt>fork()</tt> and <tt>execvp()</tt>
|
||||||
|
to execute <tt>n-1</tt> copies of itself on the slave node.
|
||||||
|
In this way, one worker process was spawned per core on the
|
||||||
|
slave node.
|
||||||
|
</p>
|
||||||
|
|
||||||
<a name="evaluation" />
|
<a name="evaluation" />
|
||||||
<h4>Evaluation</h4>
|
<h4>Evaluation</h4>
|
||||||
|
|
||||||
<a name="futurework" />
|
<a name="futurework" />
|
||||||
<h4>Future Work</h4>
|
<h4>Future Work</h4>
|
||||||
|
<p>
|
||||||
|
The current method to utilize each core of the worker nodes involves
|
||||||
|
the main process doing a <tt>fork()</tt> and <tt>execvp()</tt>
|
||||||
|
<em>n</em>-1 times, where <em>n</em> is the number
|
||||||
|
of processors detected
|
||||||
|
on the slave node.
|
||||||
|
This has the disadvantage that the scene-file is parsed and
|
||||||
|
initialization is redone for each of these processes.
|
||||||
|
This time could be saved by an implementation that instead
|
||||||
|
created another thread with <tt>pthread_create()</tt> after
|
||||||
|
parsing the scene file and building the scene.
|
||||||
|
</p>
|
||||||
|
|
||||||
</body>
|
</body>
|
||||||
</html>
|
</html>
|
||||||
|
Loading…
x
Reference in New Issue
Block a user