diff --git a/cs658/html/report.html b/cs658/html/report.html
index f15276e..22f0fac 100644
--- a/cs658/html/report.html
+++ b/cs658/html/report.html
@@ -114,7 +114,7 @@
     <p>
         To create a scalable parser to read scene files according to
         a grammar that I supplied, I used the bison parser generator.
-        I used bison because it is very portable and worked directly
+        I chose bison because it is very portable and worked directly
         with my C++ code.
         The parser produces a hierarchy of <tt>Node</tt> objects.
         Each <tt>Node</tt> object represents some part of the parse tree
@@ -123,15 +123,106 @@
         in order to evaluate all of the scene objects specified in the
         scene file, maintaining node order and parent/child relationships.
     </p>
+    <p>
+        I designed the task distribution architecture using a combination
+        of threads and processes to achieve a working master/slave setup.
+        When the program starts up, it will determine whether it is running
+        in distributed mode or not.
+        If it is not running in distributed mode (no --hosts or --host
+        argument was supplied), then the scene is rendered one pixel at
+        a time sequentially.
+        If, however, a --hosts or --host option is present, then the
+        program is running in distributed mode.
+        If a --hosts option is present, then the process is the master
+        process and it reads the list of hosts to use as slave processes
+        from the file specified.
+        If a --host and --port option are provided then the process is
+        a slave process and it connects to the master process using the
+        hostname provided on the TCP port provided.
+    </p>
+    <p>
+        In distributed mode, the master process creates a server thread.
+        This server thread listens on a free TCP port (chosen by the
+        system) for incoming connections.
+        Every time that a connection from a slave node is made to this port,
+        another connection thread is spawned to handle communication from
+        the master node's side for that slave node.
+        After the master process starts the listen thread, it also forks
+        and execs an ssh process as a sub-process.
+        This ssh process is what connects to the slave node and begins
+        executing a copy of the program at the slave node, informing
+        it of the hostname and port of the master node via the
+        --host and --port command-line options.
+    </p>
 
     <a name="implementation" />
     <h4>Implementation</h4>
+    <p>
+        When I first implemented the distribution architecture and did
+        a test-run, a slave process was created on each of the hosts
+        that I specified, but only 5-10% of each CPU was being utilized.
+        I ran "netstat -taunp" and saw that the TCP connections had
+        data in their send queues.
+        This meant that the program had already processed what it was
+        asked to process and was simply waiting for the network.
+        I realized that the TCP implementation was waiting to accumulate
+        data before sending it over the TCP connection.
+        In order to deal with this, I used the <tt>setsockopt()</tt>
+        call to set the <tt>TCP_NODELAY</tt> flag to 1.
+        This disabled "Nagle's algorithm" in the TCP subsystem for
+        that socket.
+        This meant that data would be sent as soon as it was available.
+        Setting this option on the communication socket immediately
+        allowed each slave node to begin using pretty much 100% of
+        one of its cores.
+    </p>
+    <p>
+        My first attempt to utilize each core on the system involved
+        turning on OpenMP with the compiler flag <tt>-fopenmp</tt>
+        and adding <tt>omp parallel for</tt> directives in the for
+        loop carrying out the render task.
+        This attempt failed.
+        My program would abort with errors coming from the C library.
+        I used gdb and examined stack dumps and there were pthread
+        and other C memory management functions which were throwing errors.
+        I believe this did not work well because I was manually creating
+        and managing threads with the pthread system in addition to
+        trying to use OpenMP, which was probably implemented by the
+        compiler using pthreads.
+    </p>
+    <p>
+        Because OpenMP did not work to utilize every core on a worker
+        node, I switched to a separate solution.
+        The program was already computing a list of command-line arguments
+        in order to pass to slave nodes, so I made the slave nodes record
+        this list.
+        If a process was the "master slave process" (the first process
+        executed on this slave node) then the program would call
+        <tt>n = sysconf(_SC_NPROCESSORS_CONF)</tt> to retrieve the total
+        number of processors on the system.
+        Then, the program simply did a <tt>fork()</tt> and <tt>execvp()</tt>
+        to execute <tt>n-1</tt> copies of itself on the slave node.
+        In this way, one worker process was spawned per core on the
+        slave node.
+    </p>
 
     <a name="evaluation" />
     <h4>Evaluation</h4>
 
     <a name="futurework" />
     <h4>Future Work</h4>
+    <p>
+        The current method to utilize each core of the worker nodes involves
+        the main process doing a <tt>fork()</tt> and <tt>execvp()</tt>
+        <em>n</em>-1 times, where <em>n</em> is the number
+        of processors detected
+        on the slave node.
+        This has the disadvantage that the scene-file is parsed and
+        initialization is redone for each of these processes.
+        This time could be saved by an implementation that instead
+        created another thread with <tt>pthread_create()</tt> after
+        parsing the scene file and building the scene.
+    </p>
 
 </body>
 </html>