% Preamble
\documentclass[11pt,fleqn]{article}
\usepackage{amsmath, amsthm, amssymb}
\usepackage{fancyhdr}
\oddsidemargin	-0.25in
\textwidth	6.75in
\topmargin	-0.5in
\headheight	0.75in
\headsep	0.25in
\textheight	8.75in
\pagestyle{fancy}
\renewcommand{\headrulewidth}{0pt}
\renewcommand{\footrulewidth}{0pt}
\fancyhf{}
\lhead{HW Chap. 6\\\ \\\ }
\rhead{Josh Holtrop\\2008-11-19\\CS 677}
\rfoot{\thepage}

\begin{document}

\noindent
\begin{enumerate}
\item[1.]{
    The obvious benefit of using non-blocking communication is that
    the calling process does not block while the communication is taking
    place.
    This leaves the process free to do further computation or work on
    something else.
    Another benefit of non-blocking communication is that certain types
    of deadlocks can be avoided by not having the Send function block
    until the message is received (for example, when two machines are
    both doing a send operation followed by a receive operation).

    One challenge to using non-blocking communication is that it is
    harder to program safely.
    The programmer must take more care to ensure that data being used
    in non-blocking communication is not modified while it is being used.
    A second challenge with using non-blocking communication is that if
    synchronization is necessary (i.e. the sender wants to know when
    the message was received), then they must manually poll to obtain
    this information.
}

\vskip 1em
\item[2.]{
    Assume the mesh size is $n \times n$.
    Further assume that the node doing the scatter is located in the
    top left corner of the mesh.
    First, the total data is divided $n$ ways.
    1 of these sections are sent to the node doing the scattering,
    and $n-1$ of these are sent to the next node down.
    The nodes along the left continue sending data sections down, each
    taking one section for itself.
    Then, all of the left edges break each of these $n$ sections into
    $n$ parts, and repeat the previous process to distribute these
    $n$ subparts to the nodes to their right.

    This method will take $2n$ steps to complete.
    The message transfer size is not fixed.
    The messages at the beginning (closer to the scattering node)
    are larger and subsequent messages get smaller and smaller as
    they get further from the scattering node.
}

\vskip 1em
\item[3.]{
    I wrote an MPI application which incremented a length variable from
    100 to 100000, and then sent and received a message of that length
    (the master did a send, then receive, while the slave did a receive
     and then a send).
    For each length value, I repeated this test 100 times and averaged the
    times to get the final round-trip time.
    Finally, I took the length value divided by the round-trip time to get
    the round-trip bytes per second that could be sent from one MPI host
    to another and back again.
    I recorded the length that gave the highest round-trip throughput value
    and printed that out at the end of the test.
    Unfortunately, each time I ran the test the optimal length value
    significantly.
    Sometimes it printed 6700 or 8400, and sometimes 27000 or 86000.
    So, it varied a lot.
    I am not sure if this is because once you reach a certain length the
    data is simply packed by MPI into the same-sized packets for transfer,
    or for some other reason all transfers take about the same amount of
    time so it is relatively random which one comes up most efficient.
}

\end{enumerate}

\end{document}