adding to report
git-svn-id: svn://anubis/gvsu@406 45c1a28c-8058-47b2-ae61-ca45b979098e
This commit is contained in:
parent
8c9ab81429
commit
f8b570e06b
@ -114,7 +114,7 @@
|
||||
<p>
|
||||
To create a scalable parser to read scene files according to
|
||||
a grammar that I supplied, I used the bison parser generator.
|
||||
I used bison because it is very portable and worked directly
|
||||
I chose bison because it is very portable and worked directly
|
||||
with my C++ code.
|
||||
The parser produces a hierarchy of <tt>Node</tt> objects.
|
||||
Each <tt>Node</tt> object represents some part of the parse tree
|
||||
@ -123,15 +123,106 @@
|
||||
in order to evaluate all of the scene objects specified in the
|
||||
scene file, maintaining node order and parent/child relationships.
|
||||
</p>
|
||||
<p>
|
||||
I designed the task distribution architecture using a combination
|
||||
of threads and processes to achieve a working master/slave setup.
|
||||
When the program starts up, it will determine whether it is running
|
||||
in distributed mode or not.
|
||||
If it is not running in distributed mode (no --hosts or --host
|
||||
argument was supplied), then the scene is rendered one pixel at
|
||||
a time sequentially.
|
||||
If, however, a --hosts or --host option is present, then the
|
||||
program is running in distributed mode.
|
||||
If a --hosts option is present, then the process is the master
|
||||
process and it reads the list of hosts to use as slave processes
|
||||
from the file specified.
|
||||
If a --host and --port option are provided then the process is
|
||||
a slave process and it connects to the master process using the
|
||||
hostname provided on the TCP port provided.
|
||||
</p>
|
||||
<p>
|
||||
In distributed mode, the master process creates a server thread.
|
||||
This server thread listens on a free TCP port (chosen by the
|
||||
system) for incoming connections.
|
||||
Every time that a connection from a slave node is made to this port,
|
||||
another connection thread is spawned to handle communication from
|
||||
the master node's side for that slave node.
|
||||
After the master process starts the listen thread, it also forks
|
||||
and execs an ssh process as a sub-process.
|
||||
This ssh process is what connects to the slave node and begins
|
||||
executing a copy of the program at the slave node, informing
|
||||
it of the hostname and port of the master node via the
|
||||
--host and --port command-line options.
|
||||
</p>
|
||||
|
||||
<a name="implementation" />
|
||||
<h4>Implementation</h4>
|
||||
<p>
|
||||
When I first implemented the distribution architecture and did
|
||||
a test-run, a slave process was created on each of the hosts
|
||||
that I specified, but only 5-10% of each CPU was being utilized.
|
||||
I ran "netstat -taunp" and saw that the TCP connections had
|
||||
data in their send queues.
|
||||
This meant that the program had already processed what it was
|
||||
asked to process and was simply waiting for the network.
|
||||
I realized that the TCP implementation was waiting to accumulate
|
||||
data before sending it over the TCP connection.
|
||||
In order to deal with this, I used the <tt>setsockopt()</tt>
|
||||
call to set the <tt>TCP_NODELAY</tt> flag to 1.
|
||||
This disabled "Nagle's algorithm" in the TCP subsystem for
|
||||
that socket.
|
||||
This meant that data would be sent as soon as it was available.
|
||||
Setting this option on the communication socket immediately
|
||||
allowed each slave node to begin using pretty much 100% of
|
||||
one of its cores.
|
||||
</p>
|
||||
<p>
|
||||
My first attempt to utilize each core on the system involved
|
||||
turning on OpenMP with the compiler flag <tt>-fopenmp</tt>
|
||||
and adding <tt>omp parallel for</tt> directives in the for
|
||||
loop carrying out the render task.
|
||||
This attempt failed.
|
||||
My program would abort with errors coming from the C library.
|
||||
I used gdb and examined stack dumps and there were pthread
|
||||
and other C memory management functions which were throwing errors.
|
||||
I believe this did not work well because I was manually creating
|
||||
and managing threads with the pthread system in addition to
|
||||
trying to use OpenMP, which was probably implemented by the
|
||||
compiler using pthreads.
|
||||
</p>
|
||||
<p>
|
||||
Because OpenMP did not work to utilize every core on a worker
|
||||
node, I switched to a separate solution.
|
||||
The program was already computing a list of command-line arguments
|
||||
in order to pass to slave nodes, so I made the slave nodes record
|
||||
this list.
|
||||
If a process was the "master slave process" (the first process
|
||||
executed on this slave node) then the program would call
|
||||
<tt>n = sysconf(_SC_NPROCESSORS_CONF)</tt> to retrieve the total
|
||||
number of processors on the system.
|
||||
Then, the program simply did a <tt>fork()</tt> and <tt>execvp()</tt>
|
||||
to execute <tt>n-1</tt> copies of itself on the slave node.
|
||||
In this way, one worker process was spawned per core on the
|
||||
slave node.
|
||||
</p>
|
||||
|
||||
<a name="evaluation" />
|
||||
<h4>Evaluation</h4>
|
||||
|
||||
<a name="futurework" />
|
||||
<h4>Future Work</h4>
|
||||
<p>
|
||||
The current method to utilize each core of the worker nodes involves
|
||||
the main process doing a <tt>fork()</tt> and <tt>execvp()</tt>
|
||||
<em>n</em>-1 times, where <em>n</em> is the number
|
||||
of processors detected
|
||||
on the slave node.
|
||||
This has the disadvantage that the scene-file is parsed and
|
||||
initialization is redone for each of these processes.
|
||||
This time could be saved by an implementation that instead
|
||||
created another thread with <tt>pthread_create()</tt> after
|
||||
parsing the scene file and building the scene.
|
||||
</p>
|
||||
|
||||
</body>
|
||||
</html>
|
||||
|
Loading…
x
Reference in New Issue
Block a user