Monday, October 19, 2009

Map Reduce

The Map Reduce pattern is quite similar to the Fork/Join pattern.  However, whereas the fork/join pattern recursively subdivides one problem into smaller ones, and combines the results when the stack unwinds, the Map Reduce pattern operates over a large list of independent problems, providing a mechanism to gather results.  The primary difference, then, being whether the units of work to be completed are decompositions of a large monolithic problem, or samples in a search space--independent from one another, but culminating at a solution.

Errors should be handled in a configurable way, allowing the application developer to specify behavior on a case by case basis.  On some problems, failure of any subproblem may prevent the problem from being solved, whereas in, say, a monte carlo simulation, the failure of one task may be essentially inconsequential, presuming that the failure isn't related to a flaw in the underlying model.  As such, the application developer would like to either ignore the error, schedule the task for re-execution (in case the error was transient), or terminate the computation.  We certainly wouldn't want the framework making a decision to terminate the computation, potentially losing all existing computation results, due to a small anomaly in one task.  Hence, a configurable error handling system would be the only way to make a framework general-purpose enough.

I haven't ever used this pattern, and I actually had to a little bit of Wikipedia-ing in order to get a better idea of real applications.

This pattern excels when the majority of the work to be performed is in the "map" function, to be executed independently.  When there are a great deal of interdependencies in the "reduce" function, the pattern won't scale well, as reductions may need to be deferred until other mappings have been performed.  This case may be outside of the scope of the pattern, but if so, it can be added to the list of weaknesses of the pattern.

No comments:

Post a Comment