adding to report

git-svn-id: svn://anubis/gvsu@406 45c1a28c-8058-47b2-ae61-ca45b979098e
2009-04-16 02:07:12 +00:00 · 2009-04-16 02:07:12 +00:00 · f8b570e06b
commit f8b570e06b
parent 8c9ab81429
1 changed files with 92 additions and 1 deletions
--- a/cs658/html/report.html
+++ b/cs658/html/report.html
@ -114,7 +114,7 @@
    <p>
        To create a scalable parser to read scene files according to
        a grammar that I supplied, I used the bison parser generator.
-        I used bison because it is very portable and worked directly
+        I chose bison because it is very portable and worked directly
        with my C++ code.
        The parser produces a hierarchy of <tt>Node</tt> objects.
        Each <tt>Node</tt> object represents some part of the parse tree
@ -123,15 +123,106 @@
        in order to evaluate all of the scene objects specified in the
        scene file, maintaining node order and parent/child relationships.
    </p>
    <p>
        I designed the task distribution architecture using a combination
        of threads and processes to achieve a working master/slave setup.
        When the program starts up, it will determine whether it is running
        in distributed mode or not.
        If it is not running in distributed mode (no --hosts or --host
        argument was supplied), then the scene is rendered one pixel at
        a time sequentially.
        If, however, a --hosts or --host option is present, then the
        program is running in distributed mode.
        If a --hosts option is present, then the process is the master
        process and it reads the list of hosts to use as slave processes
        from the file specified.
        If a --host and --port option are provided then the process is
        a slave process and it connects to the master process using the
        hostname provided on the TCP port provided.
    </p>
    <p>
        In distributed mode, the master process creates a server thread.
        This server thread listens on a free TCP port (chosen by the
        system) for incoming connections.
        Every time that a connection from a slave node is made to this port,
        another connection thread is spawned to handle communication from
        the master node's side for that slave node.
        After the master process starts the listen thread, it also forks
        and execs an ssh process as a sub-process.
        This ssh process is what connects to the slave node and begins
        executing a copy of the program at the slave node, informing
        it of the hostname and port of the master node via the
        --host and --port command-line options.
    </p>
    <a name="implementation" />
    <h4>Implementation</h4>
    <p>
        When I first implemented the distribution architecture and did
        a test-run, a slave process was created on each of the hosts
        that I specified, but only 5-10% of each CPU was being utilized.
        I ran "netstat -taunp" and saw that the TCP connections had
        data in their send queues.
        This meant that the program had already processed what it was
        asked to process and was simply waiting for the network.
        I realized that the TCP implementation was waiting to accumulate
        data before sending it over the TCP connection.
        In order to deal with this, I used the <tt>setsockopt()</tt>
        call to set the <tt>TCP_NODELAY</tt> flag to 1.
        This disabled "Nagle's algorithm" in the TCP subsystem for
        that socket.
        This meant that data would be sent as soon as it was available.
        Setting this option on the communication socket immediately
        allowed each slave node to begin using pretty much 100% of
        one of its cores.
    </p>
    <p>
        My first attempt to utilize each core on the system involved
        turning on OpenMP with the compiler flag <tt>-fopenmp</tt>
        and adding <tt>omp parallel for</tt> directives in the for
        loop carrying out the render task.
        This attempt failed.
        My program would abort with errors coming from the C library.
        I used gdb and examined stack dumps and there were pthread
        and other C memory management functions which were throwing errors.
        I believe this did not work well because I was manually creating
        and managing threads with the pthread system in addition to
        trying to use OpenMP, which was probably implemented by the
        compiler using pthreads.
    </p>
    <p>
        Because OpenMP did not work to utilize every core on a worker
        node, I switched to a separate solution.
        The program was already computing a list of command-line arguments
        in order to pass to slave nodes, so I made the slave nodes record
        this list.
        If a process was the "master slave process" (the first process
        executed on this slave node) then the program would call
        <tt>n = sysconf(_SC_NPROCESSORS_CONF)</tt> to retrieve the total
        number of processors on the system.
        Then, the program simply did a <tt>fork()</tt> and <tt>execvp()</tt>
        to execute <tt>n-1</tt> copies of itself on the slave node.
        In this way, one worker process was spawned per core on the
        slave node.
    </p>
    <a name="evaluation" />
    <h4>Evaluation</h4>
    <a name="futurework" />
    <h4>Future Work</h4>
    <p>
        The current method to utilize each core of the worker nodes involves
        the main process doing a <tt>fork()</tt> and <tt>execvp()</tt>
        <em>n</em>-1 times, where <em>n</em> is the number
        of processors detected
        on the slave node.
        This has the disadvantage that the scene-file is parsed and
        initialization is redone for each of these processes.
        This time could be saved by an implementation that instead
        created another thread with <tt>pthread_create()</tt> after
        parsing the scene file and building the scene.
    </p>
 </body>
 </html>