Matrix multiply in parallel - is a different result ok?
By Patrick Leonard, VP Engineering & Product Strategy
When moving a production application from one system to another, extensive testing is generally done to ensure, among other things, that results from the new systems agree with expected results from the old system. This is true whether changing operating systems, hardware, or anything else.
For example, many financial services firms have moved from Unix systems to Linux for a variety of good reasons. When moving quantitative analysis applications, they had to verify - to multiple significant digits - that calculations done on Linux would not be different from what they got in the old system.
Different is not always wrong - sometimes a different new result is “more correct” - but it takes effort and time to verify that and make sure.
Now many companies are moving from sequential processing to parallel processing. This can actually be a bit trickier. Certain mathematical algorithms calculate differently in a parallel environment than in a sequential environment. This may not have anything to do with the implementation, it is often just the nature of the numbers.
Matrix multiplication is an example of this. Since matrix multiplication is not commutative in most cases, multiplying a matrix in parallel can result in a different outcome because the multiplication and subsequent addition is necessarily done in a different order.
Here is an example (thanks to David Haney):
Given two 4 x 4 matrices (A and B), you would normally calculate the result in 0,0 as:
(A00 * B00) + (A01 * B10) + (A02 * B20) + (A03 * B03)
If you change the order of operations though, like the following (note the parens):
((A00 * B00) + (A01 * B10)) + ((A02 * B20) + (A03 * B03))
Then you might see different results, depending on how the floating point rounding turns out. You probably won’t see much skew at this scale (especially if all of the numbers are roughly the same magnitude), but if you were dealing with an 1024 x 1024 matrix, you would probably start seeing some variation.
There are some algorithms for breaking up a matrix multiply that allow you to maintain equivalent results to sequential, but still at least partially execute the code in parallel, but from what we’ve seen those methods look like they’re less efficient than algorithms that do some amount of reordering.
The outcome, although different, may not be any less “correct”. But that difference may have business consequences that need to be planned for. Regardless of the software programming model and technology used to go parallel, this is something to be mindful of.
Del.icio.us | Technorati | Digg | Slashdot
