Times for big.seq and big.unk on mp: Sequential: 5.33091 sec Threaded: num_threads time 1 10.7522 2 11.1476 4 15.0832 8 36.2662 The threaded version is slower in general than the sequential version because there is a lot more logic involved in each step for calculating the indices into the matrix and traversing it along diagonals instead of row-wise. With more threads, the threaded version also has to synchronize more (each of the threads does a barrier wait at the end of each step). Since some threads do not have much work, especially at the beginning and end of the algorithm, this leads to more time being taken doing the synchronization than the actual computation. In contrast, when the "unknown" search sequence grows in length and the known database sequence shrinks, such that the two sequences are "more even" in length, then the threaded version does much better because there is less synchronization required compared to the amount of actual computation being done: Times for more-even.seq and more-even.unk on mp: Sequential: 1.34995 sec Threaded: num_threads time 1 10.0506 2 4.26101 4 2.7733 8 2.64885