diff --git a/cs677/hw4/hw.tex b/cs677/hw4/hw.tex
index 82c5deb..700bc92 100644
--- a/cs677/hw4/hw.tex
+++ b/cs677/hw4/hw.tex
@@ -37,17 +37,45 @@
 \item[2.]{
     We are to transpose an $n \times n$ matrix that is initially
     rowwise block-decomposed among $p$ processes.
-    Assume that $p \leq n$ and if any process $i$ owns $k > 1$ rows of
-    the matrix, then the $k$ rows that it owns are contiguous.
+    Originally I interpreted the problem description to mean that after
+    the matrix was transposed, then process $i$ would be responsible
+    for the same column numbers that it originally had rows for.
+    But, I realized this would not make much sense since then no communication
+    would be needed at all - the process could simply use the same data
+    that it had in its rows as its column data and transpose its own data.
+    So, instead I interpreted the problem to mean that a given process $i$
+    owns certain rows of the matrix before the transposition.
+    After the transposition, it still owns the same row indices but
+    these rows should contain the transposed matrix data.
 
-    Then, the procedure to transpose the matrix is as follows:
-    For each process $i$, a gather operation is performed.
+    I assume that $p \leq n$ and if any process $i$ owns $k > 1$ rows of
+    the matrix, then the $k$ rows that it owns are contiguous.
+    I also assume that a gather operation can be done in $\log p$
+    communication steps.
+
+    Then, I give a procedure to transpose the matrix is as follows:
+    For each row $r \in \{0, 1, \ldots, n-1\}$,
+        a gather operation is performed.
+    Let $i$ be the process that owns row $r$.
     In this gather operation, each process $j$ sends to $i$ the $k$
-    values it owns in column $i$ (where $k$ is the number of rows
-    assigned to process $j$).
+        values it owns from column $r$ (where $k$ is the number of rows
+        assigned to process $j$).
     Thus, at the end of each gather operation process $i$ has received
-    the entire contents of column $i$ of the matrix.
-    This algorithm takes $p \log p$ steps to complete.
+        the entire contents of column $r$ of the matrix
+        (and has stored them now in row $r$ of a receive matrix in
+         order to preform the transposition).
+    This algorithm takes $n \log p$ communication steps to complete.
+    
+    This algorithm could be tweaked to have each process accumulate
+    the $k$ values of column $r$ that it owns \textbf{for each}
+    of the $\ell$ columns that the receiving process will require
+    from it and transmit these all to the receiving process in a single
+    gather.
+    This would mean there was one gather per process instead of per
+    row of the matrix, which would reduce the number of communication
+    steps to $p \log p$.
+    I have implemented in MPI my first algorithm which does a gather
+    for each row in the transposed matrix.
 }