The Evolution of Chaos

Wednesday, November 11, 2009

PRES

Debugging tools have come a long way--who would have thought that with a push of a button you can step through code in different languages running across multiple machines in a seamless environment. These tools are invaluable, but what do you do before you have a bug within a configured environment? Enter the nasty no-repro bug. These are the gremlins that all software developers dread--an issue either reported or confirmed, the detail of which we don't know. How do we handle this? If the error is reproducable in a certain environment, we might have the luxury of instrumenting the code temporarily, and incurring the high overhead associated, or we can just start guessing (see Speculation pattern ;-).

Uniprocessor debugging is hard because issues can be sourced across space (code) and time. On a multiprocessor system, we introduce additional dimensions in which errors can occur, so we end up with something that feels intuitively like geometric growth of debugging complexity.

A tool like PRES is a welcome addition to the developer's troubleshooting arsenal. I agree that bugs don't *need* to be reproducible the very first time in the lab, but especially if the replays are entirely automated, a small number of replays can easily be tolerated (and really, what other choice do you have if you can't withstand the overhead of exhaustive logging?).

Sources of nondeterminism that make this whole process difficult can be somewhat hard to reproduce, due to some of them being generated by low-level constructs like interrupts. Virtual machine technology can help alleviate some of this by virtualizing things that were once relegated to pure hardware-controlled method, with limited possibility to control--now a tool like PRES could decide when things like interrupts might be generated.

Monday, November 9, 2009

Loop/Task/Graph

Loop Parallelism:

I haven't specifically refactored from sequential to parallel, but the approach in the pattern seems very logical. If you have a stable build you can do unit testing by comparing the output of the sequential version to that of the current stage of evolution, and even dynamically generate test cases as you have a working (sequential) version.

I have used some tools in the .NET environment to profile the execution for performance bottlenecks, though my course of action in these cases was sequential optimization rather than parallelization. I could see that some applications might not require such tools to isolate the best portion to parallelize, but I would think that the profiler is more "fact based", in that what you think is the bottleneck might not truly be one, and that you would likely make better decisions when assisted by tools.

I could definitely see that a bunch of different transformations/evolution iterations would be needed to see performance improvement. Likely, early transformations would be adding overhead to support the first items of parallelization, which would be amortized over more parallel execution down the road.

Task Queue:

I think that a task queue could be an implementation level detail of a fork/join or divide-and-conquer algorithm. The fork/join, for example, deals with how to decompose a problem into smaller subproblems that can be executed in parallel, and a task queue would be one way of executing these tasks once defined.

As far as the existence of source code, this seems like one of the patterns that didn't need it quite so much--I think everyone gets the idea of a queue, so probably just the variations from the standard conception would have been sufficient.

Graph Partitioning:

I'll have to admit, quite a lot of this pattern was beyond my realm of expertise. It would seem to me that you would have to have quite a complex, dynamic (from execution to execution), and intensive problem to solve to warrant the overhead of this pattern. Additionally, given the complexity of implementing it, you'd better *really* need that performance boost to risk the bugs that might creep into the program due to this complexity.

Tuesday, November 3, 2009

Task Parallelism/Recursive Splitting/Discrete Event

Task parallelism seems like a pretty straightforward pattern from its title. Its content, however, is not quite what I expected. It seems to be a summary of other patterns in many regards, and its actual content seems to be more of a performance tuning guide from an architectural level than a way of structuring a solution. Most of my experience has been on the .NET framework on Windows, and I've often used the thread pool to achieve a task parallelism type of implementation. The software that I have worked on has been generally utilized large grained divisions, so I haven't run into any situations where specifically better hardware support for task queue would be necessary, however, I can see that as the progression of computer speed is pushed to the parallel rather than the clock, and algorithms need to follow suit to provide the expected speedup, this type of hardware might become necessary to be able to leverage ever-more-parallel yet still diverse install base.

In the recursive splitting, the granularity of execution is controlled by the size of the base case of recursion. In most parallel algorithms, there is a need for balance between perceived running time (in a real-time clock sense), and the number of total operations executed, as may be important in multi-user systems/servers, or when there are tight resource constraints. Finding the balance between these factors is something that's quite hard to do at the abstract level--different hardware with different parallel facilities will execute the same algorithm in very different ways, and therefore the right balance is likely dependent upon the hardware, load, and required responsiveness and resource limitation characteristics of the system. Since recursion and iteration are essentially isomorphic, any algorithm that iterates over some data or performs a series of steps could be considered recursive, though not very naturally in some cases. Hence, this pattern, taken indiscriminately, can be applied to a huge class of problems.

I haven't worked on a lot of "pure" parallel systems, so I would have to say that message passing, such as it would be viewed from a SOA perspective, is the environment I'm more familiar with. I feel that messages are a pretty natural way to separate parallel executions, because it unclutters the interactions. An example of a system where the ordering of messages matters might be in a banking application--overdraft fees are charged whenever a customer withdraws (through check, ATM, etc.) more than their account balance. If messages were received out of order, overdraft conditions would not occur in the desired or consistent order. With regards to deadlock detection, I think that it is a matter of the domain in which the algorithm is being implemented as to whether timeouts or deadlock detection is a better method. If all executions are tightly bounded with low standard deviation running times, then timeouts would be quite appropriate, as the system would be quite certain that deadlock has occurred within a reasonable amount of time. In other situations, where running times may vary greatly, and/or it's very costly to rollback a transactions unnecessarily, the cost of deadlock detection may be well worth it.

Armstrong Ch5

AND supervisors make sense when a bunch of processes are collaborating and building a collective state towards some goal. In these cases, the supervisor couldn't restart just one of these processes, because it would introduce inconsistent state, and hence all children must be restarted if any need to be.

OR supervisors make sense when each processes being supervised is essentially independent, or at least where when processes communicate, they don't rely on any monotonic progression of state.

Erlang's method for restartability does not necessarily make restarts cheap, but it certainly provides a good framework through which the programmer can attempt to do so. Providing the AND/OR hierarchical granularity certainly helps isolate the scope of restarts, but there could certainly be situations where a cascade of restarts occur, each at a larger granularity, until a total subsystem is restarted, where in a different model the programmer may have been able to know that an entire subsystem restart would be necessary, and therefore bypassed all of the cascades of restart.

Using the process/supervisor model keeps the programmer aware of the unit of restart that their code may be subjected to, and therefore it allows for the system to be implemented in such a way that restarts can be enacted at multiple levels--more naive approaches wouldn't necessarily have the clarity of separation to support this. Because of this, faults can be isolated to a certain process and its supervisor, and therefore not unwind the stack to some arbitrary point, and hence fault tolerance is greatly enhanced.

A rejuvenation model for fault recovery could certainly reduce some of the overhead associated with restarting a big AND group, but it's possible that any benefit to be gained by this would be nullified by the additional complexity involved in supporting the rejuvenation process. Also, I could imagine that rejuvenation isn't as robust as a restart, and therefore additional errors may be introduced by trying to support such a model.

Erlang prescribes an adherence to specification such that if any deviation is detected, and exception should be thrown. This is contrary to the all too common reality that a programmer might guess and/or impose their own personal beliefs of how the system should operate, complicating the matter.

Providing a good mental framework, even if not a concrete software framework, for managing the supervisor/task relationship goes a long way to influencing the way a piece of software is built. As such, even though the supervisor logic doesn't come "out of the box", adopting this paradigm still provides a solid separation of concerns which is well suited to the task at hand, and therefore is superior to more traditional organizations.

I don't see that Erlang's model for restarts is that much different from the crash-only model, except for the granularity involved--whereas the crash-only model is defined at the "component" level, Erlang's model decomposes this into a granular hierarchy, each piece of which does essentially the same thing. The Erlang model benefits and suffers from the phenomenon I described earlier, where a cascade of restarts may be undertaken, ultimately resulting in whole-subsystem restart. In this case, the crash-only system would have been faster, but in most cases, there is some distinct scope beyond which an error doesn't persist, and thus the Erlang model provides a better way of handling things.

Wednesday, October 21, 2009

Armstrong Ch2

Joe Armstrong's definition of what descriptions a software architecture should be composed of are quite good a characterizing a system. One thing that I think should be moved out of the problem domain section into its own is that of performance constraints and requirements. In this case, the problem domain clearly states that a telecom system should exhibit certain behavior. But in other cases, the problem domain might not be so explicit--a system designed to revolutionize an industry by bringing priorly unheard of speed to solving problems might not characterize the performance requirements in the problem domain--this would be more an attribute of the system that we are trying to build to suit the problem domain, rather than an intrinsic property of it. Required performance guidelines are certainly a central tenet of how a system should be architected.

Messaging for parallelism makes a lot of sense. It helps to reduce the affect of unnecessary assumptions that can be easily imposed by shared memory type systems. I have worked with a number of web services oriented applications which essentially use messaging, and it certainly does enforce strong separation. However, in most of the systems I have worked on with web services, the calls have been blocking in nature, hence no parallelism gains were realized.

"Fail fast" kind of scares me. The idea that an entire process should terminate its execution upon reaching an error case leads me to believe that performance could be detrimentally affected. For example, in the case that a process uses a great deal of cached data, assumedly this cache would be flushed when the process "fast fails", and therefore there could be quite a bit of a performance hit by reloading this cache each time the process restarts.

I think that concurrency oriented programming makes sense in certain problem domains. In others, I could see this style as an impediment from getting the job done. I suppose that as the supporting tools for this type of programming evolve, the barrier to entry of this programming style will be reduced, but it seems to me that unless concurrency is priority one in the system, adopting this model would potentially make more work for you. Having said this, more and more systems are setting concurrency as priority one (especially in the server software world), so I am by no means discrediting this style. Rather, I am proposing that it should be adopted judiciously--only when concurrency is of high enough priority to warrant the overhead.

An unreliable messaging system makes a ton of sense to me. Just look at how the internet has evolved, and worked well (arguably, I guess) for a wide variety of applications. To impose a reliable messaging system would be to burden certain types of communication with overhead that they don't require. Furthermore, as reliable messaging can be built on top of unreliable messaging with proven techniques, I believe that this is the best choice for a messaging system.

Monday, October 19, 2009

Map Reduce

The Map Reduce pattern is quite similar to the Fork/Join pattern. However, whereas the fork/join pattern recursively subdivides one problem into smaller ones, and combines the results when the stack unwinds, the Map Reduce pattern operates over a large list of independent problems, providing a mechanism to gather results. The primary difference, then, being whether the units of work to be completed are decompositions of a large monolithic problem, or samples in a search space--independent from one another, but culminating at a solution.

Errors should be handled in a configurable way, allowing the application developer to specify behavior on a case by case basis. On some problems, failure of any subproblem may prevent the problem from being solved, whereas in, say, a monte carlo simulation, the failure of one task may be essentially inconsequential, presuming that the failure isn't related to a flaw in the underlying model. As such, the application developer would like to either ignore the error, schedule the task for re-execution (in case the error was transient), or terminate the computation. We certainly wouldn't want the framework making a decision to terminate the computation, potentially losing all existing computation results, due to a small anomaly in one task. Hence, a configurable error handling system would be the only way to make a framework general-purpose enough.

I haven't ever used this pattern, and I actually had to a little bit of Wikipedia-ing in order to get a better idea of real applications.

This pattern excels when the majority of the work to be performed is in the "map" function, to be executed independently. When there are a great deal of interdependencies in the "reduce" function, the pattern won't scale well, as reductions may need to be deferred until other mappings have been performed. This case may be outside of the scope of the pattern, but if so, it can be added to the list of weaknesses of the pattern.

Event Based, Implicit Invocation Pattern

The "Event Based, Implicit Invocation" (EBII) Pattern is one that is so incredibly common, it's almost redundant to document it as a pattern, but nonetheless, for completeness, it can prove useful to understand the differences and constraints in implementing it.

The key difference between this pattern and the Observer pattern is the cardinality and loose coupling between the sender and receiver. Whereas in the Observer pattern the "publisher" knows its "subscribers" and issues notifications, in the EBII pattern, the publisher knows nothing about its "subscribers", but rather just knows its own notification mechanisms. These mechanisms interface directly only with a manager component, which is responsible for distributing these notifications. Additionally, whereas the Observer pattern states a "one-to-many" cardinality, the EBII pattern says that any object can publish events and any object can subscribe to events.

When it comes to dispatch methods, implicit invocation provides greater decoupling than explicit invocation. By using explicit invocation, the pattern is considerably closer to the Observer pattern. An additional dimension of decoupling comes from using nonblocking calls. If blocking calls were to be used, the notifying object's timing would be affected by the receiving object's handler execution to a considerably greater degree than in the nonblocking case. Obviously, the event handler will still utilize system resources, but this won't as fundamentally affect the timing semantics inherent in the program.

Applications of this pattern are so prolific (as previously mentioned), that the author named wide classes of programs using this pattern (such as all clients registering over a network to receive some sort of notification), that it's hard to think of an example that doesn't fit under the umbrella provided. RSS feeds would fall under this categorization as a utilization of this pattern.

As is explained in this pattern, the manager component has everything to do with how the event notification system scales. As such, it would be inappropriate to implement a notification broker for all types of events. Imagine the same system being responsible for handling IO Interrupts as RSS notifications--the requirements of performance and complexity are so divergent that one system could not be expected to span this expanse.

Error handling in an event based system is the responsibility of the event receiver. If an event sender *really* needs to know about a failure in another component, it should register a listener for the receiver's error event. This introduces a bit of a bidirectional constraint, but still maintains a great deal of decoupling.