gvsu/cs677/hw4/hw.tex
josh 7aeb84bd15 updated practical part
git-svn-id: svn://anubis/gvsu@294 45c1a28c-8058-47b2-ae61-ca45b979098e
2008-11-29 18:46:18 +00:00

85 lines
3.4 KiB
TeX

% Preamble
\documentclass[11pt,fleqn]{article}
\usepackage{amsmath, amsthm, amssymb}
\usepackage{fancyhdr}
\oddsidemargin -0.25in
\textwidth 6.75in
\topmargin -0.5in
\headheight 0.75in
\headsep 0.25in
\textheight 8.75in
\pagestyle{fancy}
\renewcommand{\headrulewidth}{0pt}
\renewcommand{\footrulewidth}{0pt}
\fancyhf{}
\lhead{HW Chap. 4\\\ \\\ }
\rhead{Josh Holtrop\\2008-12-03\\CS 677}
\rfoot{\thepage}
\begin{document}
\noindent
\begin{enumerate}
\item[1.]{
In store-and-forward communication, the communication cost is given by
$$T_{\mathrm{comm}} = t_s + (t_h + mt_w)\ell = t_s + \ell t_h + \ell mt_w$$
In cut-through routing, the communication cost is given by
$$T_{\mathrm{comm}} = t_s + \ell t_h + mt_w$$
Since the ``header'' is the only part of the communication that is
encountering overhead for the $\ell$ links in the communication network,
cut-through routing can save communication time on the order of
$(\ell - 1) mt_w$.
Obviously, this makes cut-through routing only advantageous on
architectures with $\ell > 1$, meaning non-fully-connected networks.
}
\vskip 1em
\item[2.]{
We are to transpose an $n \times n$ matrix that is initially
rowwise block-decomposed among $p$ processes.
Originally I interpreted the problem description to mean that after
the matrix was transposed, then process $i$ would be responsible
for the same column numbers that it originally had rows for.
But, I realized this would not make much sense since then no communication
would be needed at all - the process could simply use the same data
that it had in its rows as its column data and transpose its own data.
So, instead I interpreted the problem to mean that a given process $i$
owns certain rows of the matrix before the transposition.
After the transposition, it still owns the same row indices but
these rows should contain the transposed matrix data.
I assume that $p \leq n$ and if any process $i$ owns $k > 1$ rows of
the matrix, then the $k$ rows that it owns are contiguous.
I also assume that a gather operation can be done in $\log p$
communication steps.
Then, I give a procedure to transpose the matrix is as follows:
For each row $r \in \{0, 1, \ldots, n-1\}$,
a gather operation is performed.
Let $i$ be the process that owns row $r$.
In this gather operation, each process $j$ sends to $i$ the $k$
values it owns from column $r$ (where $k$ is the number of rows
assigned to process $j$).
Thus, at the end of each gather operation process $i$ has received
the entire contents of column $r$ of the matrix
(and has stored them now in row $r$ of a receive matrix in
order to preform the transposition).
This algorithm takes $n \log p$ communication steps to complete.
This algorithm could be tweaked to have each process accumulate
the $k$ values of column $r$ that it owns \textbf{for each}
of the $\ell$ columns that the receiving process will require
from it and transmit these all to the receiving process in a single
gather.
This would mean there was one gather per process instead of per
row of the matrix, which would reduce the number of communication
steps to $p \log p$.
I have implemented in MPI my first algorithm which does a gather
for each row in the transposed matrix.
}
\end{enumerate}
\end{document}