Wednesday, November 11, 2009

PRES

Debugging tools have come a long way--who would have thought that with a push of a button you can step through code in different languages running across multiple machines in a seamless environment.  These tools are invaluable, but what do you do before you have a bug within a configured environment?  Enter the nasty no-repro bug.  These are the gremlins that all software developers dread--an issue either reported or confirmed, the detail of which we don't know.  How do we handle this?  If the error is reproducable in a certain environment, we might have the luxury of instrumenting the code temporarily, and incurring the high overhead associated, or we can just start guessing (see Speculation pattern ;-).

Uniprocessor debugging is hard because issues can be sourced across space (code) and time.  On a multiprocessor system, we introduce additional dimensions in which errors can occur, so we end up with something that feels intuitively like geometric growth of debugging complexity.

A tool like PRES is a welcome addition to the developer's troubleshooting arsenal.  I agree that bugs don't *need* to be reproducible the very first time in the lab, but especially if the replays are entirely automated, a small number of replays can easily be tolerated (and really, what other choice do you have if you can't withstand the overhead of exhaustive logging?).

Sources of nondeterminism that make this whole process difficult can be somewhat hard to reproduce, due to some of them being generated by low-level constructs like interrupts.  Virtual machine technology can help alleviate some of this by virtualizing things that were once relegated to pure hardware-controlled method, with limited possibility to control--now a tool like PRES could decide when things like interrupts might be generated.

Monday, November 9, 2009

Loop/Task/Graph

Loop Parallelism:

I haven't specifically refactored from sequential to parallel, but the approach in the pattern seems very logical.  If you have a stable build you can do unit testing by comparing the output of the sequential version to that of the current stage of evolution, and even dynamically generate test cases as you have a working (sequential) version.

I have used some tools in the .NET environment to profile the execution for performance bottlenecks, though my course of action in these cases was sequential optimization rather than parallelization.  I could see that some applications might not require such tools to isolate the best portion to parallelize, but I would think that the profiler is more "fact based", in that what you think is the bottleneck might not truly be one, and that you would likely make better decisions when assisted by tools.

I could definitely see that a bunch of different transformations/evolution iterations would be needed to see performance improvement.  Likely, early transformations would be adding overhead to support the first items of parallelization, which would be amortized over more parallel execution down the road.

Task Queue:

I think that a task queue could be an implementation level detail of a fork/join or divide-and-conquer algorithm.  The fork/join, for example, deals with how to decompose a problem into smaller subproblems that can be executed in parallel, and a task queue would be one way of executing these tasks once defined.

As far as the existence of source code, this seems like one of the patterns that didn't need it quite so much--I think everyone gets the idea of a queue, so probably just the variations from the standard conception would have been sufficient.

Graph Partitioning:

I'll have to admit, quite a lot of this pattern was beyond my realm of expertise.  It would seem to me that you would have to have quite a complex, dynamic (from execution to execution), and intensive problem to solve to warrant the overhead of this pattern.  Additionally, given the complexity of implementing it, you'd better *really* need that performance boost to risk the bugs that might creep into the program due to this complexity.

Tuesday, November 3, 2009

Task Parallelism/Recursive Splitting/Discrete Event

Task parallelism seems like a pretty straightforward pattern from its title.  Its content, however, is not quite what I expected.  It seems to be a summary of other patterns in many regards, and its actual content seems to be more of a performance tuning guide from an architectural level than a way of structuring a solution.  Most of my experience has been on the .NET framework on Windows, and I've often used the thread pool to achieve a task parallelism type of implementation.  The software that I have worked on has been generally utilized large grained divisions, so I haven't run into any situations where specifically better hardware support for task queue would be necessary, however, I can see that as the progression of computer speed is pushed to the parallel rather than the clock, and algorithms need to follow suit to provide the expected speedup, this type of hardware might become necessary to be able to leverage ever-more-parallel yet still diverse install base.

In the recursive splitting, the granularity of execution is controlled by the size of the base case of recursion.  In most parallel algorithms, there is a need for balance between perceived running time (in a real-time clock sense), and the number of total operations executed, as may be important in multi-user systems/servers, or when there are tight resource constraints. Finding the balance between these factors is something that's quite hard to do at the abstract level--different hardware with different parallel facilities will execute the same algorithm in very different ways, and therefore the right balance is likely dependent upon the hardware, load, and required responsiveness and resource limitation characteristics of the system.  Since recursion and iteration are essentially isomorphic, any algorithm that iterates over some data or performs a series of steps could be considered recursive, though not very naturally in some cases.  Hence, this pattern, taken indiscriminately, can be applied to a huge class of problems.

I haven't worked on a lot of "pure" parallel systems, so I would have to say that message passing, such as it would be viewed from a SOA perspective, is the environment I'm more familiar with.  I feel that messages are a pretty natural way to separate parallel executions, because it unclutters the interactions.  An example of a system where the ordering of messages matters might be in a banking application--overdraft fees are charged whenever a customer withdraws (through check, ATM, etc.) more than their account balance.  If messages were received out of order, overdraft conditions would not occur in the desired or consistent order.  With regards to deadlock detection, I think that it is a matter of the domain in which the algorithm is being implemented as to whether timeouts or deadlock detection is a better method.  If all executions are tightly bounded with low standard deviation running times, then timeouts would be quite appropriate, as the system would be quite certain that deadlock has occurred within a reasonable amount of time.  In other situations, where running times may vary greatly, and/or it's very costly to rollback a transactions unnecessarily, the cost of deadlock detection may be well worth it.

Armstrong Ch5

AND supervisors make sense when a bunch of processes are collaborating and building a collective state towards some goal.  In these cases, the supervisor couldn't restart just one of these processes, because it would introduce inconsistent state, and hence all children must be restarted if any need to be.

OR supervisors make sense when each processes being supervised is essentially independent, or at least where when processes communicate, they don't rely on any monotonic progression of state.

Erlang's method for restartability does not necessarily make restarts cheap, but it certainly provides a good framework through which the programmer can attempt to do so.  Providing the AND/OR hierarchical granularity certainly helps isolate the scope of restarts, but there could certainly be situations where a cascade of restarts occur, each at a larger granularity, until a total subsystem is restarted, where in a different model the programmer may have been able to know that an entire subsystem restart would be necessary, and therefore bypassed all of the cascades of restart.

Using the process/supervisor model keeps the programmer aware of the unit of restart that their code may be subjected to, and therefore it allows for the system to be implemented in such a way that restarts can be enacted at multiple levels--more naive approaches wouldn't necessarily have the clarity of separation to support this.  Because of this, faults can be isolated to a certain process and its supervisor, and therefore not unwind the stack to some arbitrary point, and hence fault tolerance is greatly enhanced.

A rejuvenation model for fault recovery could certainly reduce some of the overhead associated with restarting a big AND group, but it's possible that any benefit to be gained by this would be nullified by the additional complexity involved in supporting the rejuvenation process.  Also, I could imagine that rejuvenation isn't as robust as a restart, and therefore additional errors may be introduced by trying to support such a model.

Erlang prescribes an adherence to specification such that if any deviation is detected, and exception should be thrown.  This is contrary to the all too common reality that a programmer might guess and/or impose their own personal beliefs of how the system should operate, complicating the matter.

Providing a good mental framework, even if not a concrete software framework, for managing the supervisor/task relationship goes a long way to influencing the way a piece of software is built.  As such, even though the supervisor logic doesn't come "out of the box", adopting this paradigm still provides a solid separation of concerns which is well suited to the task at hand, and therefore is superior to more traditional organizations.

I don't see that Erlang's model for restarts is that much different from the crash-only model, except for the granularity involved--whereas the crash-only model is defined at the "component" level, Erlang's model decomposes this into a granular hierarchy, each piece of which does essentially the same thing.  The Erlang model benefits and suffers from the phenomenon I described earlier, where a cascade of restarts may be undertaken, ultimately resulting in whole-subsystem restart.  In this case, the crash-only system would have been faster, but in most cases, there is some distinct scope beyond which an error doesn't persist, and thus the Erlang model provides a better way of handling things.