% Preamble \documentclass[11pt,fleqn]{article} \usepackage{amsmath, amsthm, amssymb} \usepackage{fancyhdr} \oddsidemargin -0.25in \textwidth 6.75in \topmargin -0.5in \headheight 0.75in \headsep 0.25in \textheight 8.75in \pagestyle{fancy} \renewcommand{\headrulewidth}{0pt} \renewcommand{\footrulewidth}{0pt} \fancyhf{} \lhead{HW Chap. 4\\\ \\\ } \rhead{Josh Holtrop\\2008-12-03\\CS 677} \rfoot{\thepage} \begin{document} \noindent \begin{enumerate} \item[1.]{ In store-and-forward communication, the communication cost is given by $$T_{\mathrm{comm}} = t_s + (t_h + mt_w)\ell = t_s + \ell t_h + \ell mt_w$$ In cut-through routing, the communication cost is given by $$T_{\mathrm{comm}} = t_s + \ell t_h + mt_w$$ Since the ``header'' is the only part of the communication that is encountering overhead for the $\ell$ links in the communication network, cut-through routing can save communication time on the order of $(\ell - 1) mt_w$. Obviously, this makes cut-through routing only advantageous on architectures with $\ell > 1$, meaning non-fully-connected networks. } \vskip 1em \item[2.]{ We are to transpose an $n \times n$ matrix that is initially rowwise block-decomposed among $p$ processes. Originally I interpreted the problem description to mean that after the matrix was transposed, then process $i$ would be responsible for the same column numbers that it originally had rows for. But, I realized this would not make much sense since then no communication would be needed at all - the process could simply use the same data that it had in its rows as its column data and transpose its own data. So, instead I interpreted the problem to mean that a given process $i$ owns certain rows of the matrix before the transposition. After the transposition, it still owns the same row indices but these rows should contain the transposed matrix data. I assume that $p \leq n$ and if any process $i$ owns $k > 1$ rows of the matrix, then the $k$ rows that it owns are contiguous. I also assume that a gather operation can be done in $\log p$ communication steps. Then, I give a procedure to transpose the matrix is as follows: For each row $r \in \{0, 1, \ldots, n-1\}$, a gather operation is performed. Let $i$ be the process that owns row $r$. In this gather operation, each process $j$ sends to $i$ the $k$ values it owns from column $r$ (where $k$ is the number of rows assigned to process $j$). Thus, at the end of each gather operation process $i$ has received the entire contents of column $r$ of the matrix (and has stored them now in row $r$ of a receive matrix in order to preform the transposition). This algorithm takes $n \log p$ communication steps to complete. This algorithm could be tweaked to have each process accumulate the $k$ values of column $r$ that it owns \textbf{for each} of the $\ell$ columns that the receiving process will require from it and transmit these all to the receiving process in a single gather. This would mean there was one gather per process instead of per row of the matrix, which would reduce the number of communication steps to $p \log p$. I have implemented in MPI my first algorithm which does a gather for each row in the transposed matrix. } \end{enumerate} \end{document}