287 lines
13 KiB
HTML
287 lines
13 KiB
HTML
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
|
|
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
|
|
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
|
|
<head>
|
|
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
|
|
<title>FART Final Report</title>
|
|
<style type="text/css">
|
|
body {
|
|
margin: 0.2em 2ex;
|
|
background-color: #DEF;
|
|
}
|
|
.link {
|
|
padding: 0.2em 0.5ex;
|
|
text-decoration: none;
|
|
border: solid 1px black;
|
|
background-color: #CCC;
|
|
color: #000;
|
|
font-weight: bold;
|
|
}
|
|
.link:hover {
|
|
background-color: #999;
|
|
}
|
|
hr { margin: 2em auto; }
|
|
img { border-style: none; }
|
|
</style>
|
|
</head>
|
|
<body>
|
|
<h1 style="text-align:center">FART Final Report</h1>
|
|
<div style="text-align:center">
|
|
<a class="link" href="index.html#description">Project Description</a>
|
|
|
|
<a class="link" href="index.html#code">Code</a>
|
|
|
|
<a class="link" href="design.html">Design Document</a>
|
|
|
|
<a class="link" href="index.html#screenshots">Screenshots</a>
|
|
|
|
<a class="link" href="report.html">Final Report</a>
|
|
</div>
|
|
|
|
<hr/>
|
|
<h2>FART Final Report</h2>
|
|
|
|
<h4>Table of Contents</h4>
|
|
<ol>
|
|
<li><a href="#problem">The Problem</a></li>
|
|
<li><a href="#model">Model</a></li>
|
|
<li><a href="#design">Design</a></li>
|
|
<li><a href="#implementation">Implementation</a></li>
|
|
<li><a href="#evaluation">Evaluation</a></li>
|
|
<li><a href="#futurework">Future Work</a></li>
|
|
</ol>
|
|
|
|
<a name="problem" />
|
|
<h4>The Problem</h4>
|
|
<p>
|
|
FART is my semester project for the Computer Science 658 class
|
|
at Grand Valley State University.
|
|
My goal is for FART to be a distributed, fault-tolerant,
|
|
object-oriented raytracer.
|
|
The name is a
|
|
<a href="http://en.wikipedia.org/wiki/Recursive_acronym">recursive acronym</a>
|
|
similar to GNU or LAME.
|
|
</p>
|
|
<p>
|
|
The input to the program will be a scene description file.
|
|
Scene description files are structured ASCII documents.
|
|
The format is a hierarchical layout of scene options and elements.
|
|
After the scene file is parsed, the scene will be raytraced
|
|
(rendered) according to the render options.
|
|
Some of these options can be specified on the command line and
|
|
others can be specified in the scene description file.
|
|
When a given option can be specified both places, the command-line
|
|
options will override what is in the scene file.
|
|
</p>
|
|
<p>
|
|
Rendering can be done using the built-in task distribution
|
|
infrastructure.
|
|
Command-line options will be utilized to specify a "hosts file"
|
|
which can contain a list of hosts to use while rendering the scene.
|
|
The algorithm is fault-tolerant so that if any of the worker
|
|
nodes goes down the tasks can be reorganized so other nodes
|
|
complete what the faulty nodes did not.
|
|
</p>
|
|
|
|
<a name="model" />
|
|
<h4>Model</h4>
|
|
<p>
|
|
I used an object-oriented model for the program.
|
|
Each displayable scene object is represented by a C++ object
|
|
which inherits from the <tt>Shape</tt> base-class, while light objects
|
|
inherit from a base <tt>Light</tt> class.
|
|
</p>
|
|
<p>
|
|
<tt>Shape</tt> objects implement an <tt>intersect()</tt> method.
|
|
Polymorphism is utilized so that when the scene attempts to
|
|
intersect a <tt>Ray</tt> with a shape object, it does not need
|
|
to know the actual type of shape object being intersected with.
|
|
</p>
|
|
<p>
|
|
The task distribution infrastructure is implemented in a class
|
|
called <tt>distrib</tt>.
|
|
This package contains methods to start up a server thread, launch
|
|
slave processes on the worker nodes, and connect back from the
|
|
slave processes to the master server.
|
|
</p>
|
|
|
|
<a name="design" />
|
|
<h4>Design</h4>
|
|
<p>
|
|
The design document for FART is available
|
|
<a href="design.html">here</a>.
|
|
</p>
|
|
<p>
|
|
To create a scalable parser to read scene files according to
|
|
a grammar that I supplied, I used the bison parser generator.
|
|
I chose bison because it is very portable and has a C++
|
|
interface so I could interface it with the rest of my code.
|
|
The parser produces a hierarchy of <tt>Node</tt> objects.
|
|
Each <tt>Node</tt> object represents some part of the parse tree
|
|
obtained from the scene description file.
|
|
Thus, the parse tree can be traversed after parsing is complete
|
|
in order to evaluate all of the scene objects specified in the
|
|
scene file, maintaining node order and parent/child relationships.
|
|
</p>
|
|
<p>
|
|
I designed the task distribution architecture using a combination
|
|
of threads and processes to achieve a working master/slave setup.
|
|
When the program starts up, it will determine whether it is running
|
|
in distributed mode or not.
|
|
If it is not running in distributed mode (no --hosts or --host
|
|
argument was supplied), then the scene is rendered one pixel at
|
|
a time sequentially.
|
|
If, however, a --hosts or --host option is present, then the
|
|
program is running in distributed mode.
|
|
If a --hosts option is present, then the process is the master
|
|
process and it reads the list of hosts to use as slave processes
|
|
from the file specified.
|
|
If a --host and --port option are provided then the process is
|
|
a slave process and it connects to the master process using the
|
|
hostname provided on the TCP port provided.
|
|
</p>
|
|
<p>
|
|
In distributed mode, the master process creates a server thread.
|
|
This server thread listens on a free TCP port (chosen by the
|
|
system) for incoming connections.
|
|
Every time that a connection from a slave node is made to this port,
|
|
another connection thread is spawned to handle communication from
|
|
the master node's side for that slave node.
|
|
After the master process starts the listen thread, it also forks
|
|
and execs an ssh process as a sub-process.
|
|
This ssh process is what connects to the slave node and begins
|
|
executing a copy of the program at the slave node, informing
|
|
it of the hostname and port of the master node via the
|
|
--host and --port command-line options.
|
|
</p>
|
|
|
|
<a name="implementation" />
|
|
<h4>Implementation</h4>
|
|
<p>
|
|
When I first implemented the distribution architecture and did
|
|
a test-run, a slave process was created on each of the hosts
|
|
that I specified, but only 5-10% of each CPU was being utilized.
|
|
I ran "netstat -taunp" and saw that the TCP connections had
|
|
data in their send queues.
|
|
This meant that the program had already processed what it was
|
|
asked to process and was simply waiting for the network traffic
|
|
to be processed.
|
|
I realized that the TCP implementation was waiting to accumulate
|
|
data before sending it over the TCP connection.
|
|
In order to deal with this, I used the <tt>setsockopt()</tt>
|
|
call to set the <tt>TCP_NODELAY</tt> flag to 1.
|
|
This disabled "Nagle's algorithm" in the TCP subsystem for
|
|
that socket.
|
|
This meant that data would be sent as soon as it was available.
|
|
Setting this option on the communication socket immediately
|
|
allowed each slave node to begin using pretty much 100% of
|
|
one of its cores.
|
|
</p>
|
|
<p>
|
|
My first attempt to utilize each core on the system involved
|
|
turning on OpenMP with the compiler flag <tt>-fopenmp</tt>
|
|
and adding <tt>omp parallel for</tt> directives in the for
|
|
loop carrying out the render task.
|
|
This attempt failed.
|
|
My program would abort with errors coming from the C library.
|
|
I used gdb and examined stack dumps and there were pthread
|
|
and other C memory management functions which were throwing errors.
|
|
I believe this did not work well because I was manually creating
|
|
and managing threads with the pthread system in addition to
|
|
trying to use OpenMP, which was probably implemented by the
|
|
compiler using pthreads as well.
|
|
</p>
|
|
<p>
|
|
Because OpenMP did not work to utilize every core on a worker
|
|
node, I switched to a separate solution.
|
|
The program was already computing a list of command-line arguments
|
|
in order to pass to slave nodes, so I made the slave nodes record
|
|
this list.
|
|
If a process was the "master slave process" (the first process
|
|
executed on this slave node) then the program would call
|
|
<tt>n = sysconf(_SC_NPROCESSORS_CONF)</tt> to retrieve the total
|
|
number of processors on the system.
|
|
Then, the program simply did a <tt>fork()</tt> and <tt>execvp()</tt>
|
|
to execute <tt>n-1</tt> copies of itself on the slave node.
|
|
In this way, one worker process was spawned per core on the
|
|
slave node.
|
|
</p>
|
|
<p>
|
|
I had originally planned on implementing fault-tolerance in the
|
|
distribution architecture by establishing a second TCP connection
|
|
from each slave node to the master which served as a polling
|
|
connection to make sure that the slaves were still alive.
|
|
During implementation, I arrived at a more elegant solution.
|
|
I was already keeping track of the set of tasks that were
|
|
considered "in progress" as far as the master process was concerned.
|
|
If the master process received a request from a slave for a task
|
|
to work on, it would normally respond with the next available task
|
|
number until all tasks had been given out, and then it would
|
|
respond saying that there were no more tasks to work on.
|
|
I changed this slightly so that if the master got a request from
|
|
a slave for a task to work on, and all of the tasks were already
|
|
given out, then the master would respond to the slave with a
|
|
task ID from the set of tasks that were currently in progress.
|
|
That way, whether the original slave node or the new one finished
|
|
the task, the data for it would be collected.
|
|
If the original node was dead, then the new slave node would
|
|
take over the task and return the data.
|
|
If the original node was alive, but just responding very slowly,
|
|
then the replacement node could finish the task and return
|
|
the results before the original node.
|
|
This ended up working very well, as I was able to kill all of
|
|
the worker processes on a given slave node and the tasks
|
|
that they were working on were finished by other nodes later on.
|
|
</p>
|
|
|
|
<a name="evaluation" />
|
|
<h4>Evaluation</h4>
|
|
<p>
|
|
I designed the raytracer itself in the first half of the semester
|
|
and got it working well using a sequential approach.
|
|
In the second half of the semester I added the task distribution
|
|
infrastructure which allowed the raytracer to break up the
|
|
work of rendering an entire scene into tasks of a small size.
|
|
These tasks were then dispersed among slave processes which
|
|
were launched using the distribution infrastructure.
|
|
One slave process was created for every processor on each host
|
|
involved in the render.
|
|
I tested this design in the Grand Valley EOS computer lab.
|
|
A serial render of a given scene on eos01 took 38 minutes.
|
|
When I rendered it using the distribution infrastructure on
|
|
all 24 nodes (64 cores), it took about 38 seconds.
|
|
This is a speedup of about 60 times.
|
|
</p>
|
|
<p>
|
|
I was able to implement my program very closely with my original
|
|
design.
|
|
There are some features related to the raytracer (not the
|
|
distribution part of it) that I was not able to implement, but
|
|
that I am planning to add at a later time.
|
|
Such features include other Light types such as a directional light,
|
|
a jitter parameter to lights that can be used to render soft shadows,
|
|
a scene pre-parser that would allow shape definitions and looping
|
|
with variables to create patterns of objects,
|
|
texture support on the surfaces of objects,
|
|
and light refraction through semi-transparent objects.
|
|
</p>
|
|
|
|
<a name="futurework" />
|
|
<h4>Future Work</h4>
|
|
<p>
|
|
The current method to utilize each core of the worker nodes involves
|
|
the main process doing a <tt>fork()</tt> and <tt>execvp()</tt>
|
|
<em>n</em>-1 times, where <em>n</em> is the number
|
|
of processors detected
|
|
on the slave node.
|
|
This has the disadvantage that the scene-file is parsed and
|
|
initialization is repeated for each of these processes.
|
|
This time could be saved by an implementation that instead
|
|
created more threads with <tt>pthread_create()</tt> after
|
|
parsing the scene file and building the scene, and then
|
|
used these threads to split up the work for a given task.
|
|
</p>
|
|
|
|
</body>
|
|
</html>
|