diff --git a/cs677/hw4/hw.tex b/cs677/hw4/hw.tex index 82c5deb..700bc92 100644 --- a/cs677/hw4/hw.tex +++ b/cs677/hw4/hw.tex @@ -37,17 +37,45 @@ \item[2.]{ We are to transpose an $n \times n$ matrix that is initially rowwise block-decomposed among $p$ processes. - Assume that $p \leq n$ and if any process $i$ owns $k > 1$ rows of - the matrix, then the $k$ rows that it owns are contiguous. + Originally I interpreted the problem description to mean that after + the matrix was transposed, then process $i$ would be responsible + for the same column numbers that it originally had rows for. + But, I realized this would not make much sense since then no communication + would be needed at all - the process could simply use the same data + that it had in its rows as its column data and transpose its own data. + So, instead I interpreted the problem to mean that a given process $i$ + owns certain rows of the matrix before the transposition. + After the transposition, it still owns the same row indices but + these rows should contain the transposed matrix data. - Then, the procedure to transpose the matrix is as follows: - For each process $i$, a gather operation is performed. + I assume that $p \leq n$ and if any process $i$ owns $k > 1$ rows of + the matrix, then the $k$ rows that it owns are contiguous. + I also assume that a gather operation can be done in $\log p$ + communication steps. + + Then, I give a procedure to transpose the matrix is as follows: + For each row $r \in \{0, 1, \ldots, n-1\}$, + a gather operation is performed. + Let $i$ be the process that owns row $r$. In this gather operation, each process $j$ sends to $i$ the $k$ - values it owns in column $i$ (where $k$ is the number of rows - assigned to process $j$). + values it owns from column $r$ (where $k$ is the number of rows + assigned to process $j$). Thus, at the end of each gather operation process $i$ has received - the entire contents of column $i$ of the matrix. - This algorithm takes $p \log p$ steps to complete. + the entire contents of column $r$ of the matrix + (and has stored them now in row $r$ of a receive matrix in + order to preform the transposition). + This algorithm takes $n \log p$ communication steps to complete. + + This algorithm could be tweaked to have each process accumulate + the $k$ values of column $r$ that it owns \textbf{for each} + of the $\ell$ columns that the receiving process will require + from it and transmit these all to the receiving process in a single + gather. + This would mean there was one gather per process instead of per + row of the matrix, which would reduce the number of communication + steps to $p \log p$. + I have implemented in MPI my first algorithm which does a gather + for each row in the transposed matrix. }