Wednesday, September 30, 2009

Fork/Join

Multi-core processors are here to stay.  While not really general purpose, I was amazed by the 216 cores (stream processors, in this case) per card on each of the 2 new nvidia 260gtx cards I just got...this just goes to show that dividing parallel algorithms will separate the men from the boys here in the next few years.  When we talk about dual or even quad core machines, a programmer could generally get away with a single threaded algorithm for all but very large computations, but as we're driven towards massively increasing numbers of processing cores, the experience demanded by users at that time won't be feasible in a single threaded manner.

There is always a chasm between ease of implementation and maximum performance with such technologies, but I believe that user friendly general purpose frameworks with pluggable/configurable performance enhancements will make it easy to start with a simple implementation, and then as requirements demand, move towards performance tweaks at the framework configuration level.

On type of system that I can think of which does something like this is in a SQL database platform.  The tools/language present an interface conducive to parallelizing a workload through use of non-prescriptive, declarative SQL statements.  Then, later on, users can optimize performance by tweaking indexes as well as providing execution hints.  While it is true that a totally custom solution for handling the same problem could achieve better performance than the best possible implementation in a SQL database, the performance is usually close enough to be acceptable, given the reduced programmer effort.  One drawback of ANSI SQL is the inability to explicitly invoke recursive calculations/logic, but Microsoft has started to address this, albeit only in very specific and simplistic situations, through use of Common Table Expressions for recursive JOINs.

Moving back to the Java Fork/Join framework, I can see that this would be useful when the algorithm can be decomposed with the exact semantics supported.  I don't have much experience with parallel algorithms, so I don't really know what percentage of algorithms are efficiently expressible in this manner, but this would be a major factor to the widespread adoption of such a framework.  Even if an algorithm is expressible in this form, if it is considerably less intuitive and therefore harder to maintain, this could be another barrier to adoption of such a framework.

I would cite the widespread popularity of high level, GC enabled languages as reason enough to not concern oneself too much with the language level performance constraints.  Inevitably, if such languages stay popular, the overheads incurred will be worth the benefit derived, else the community would either optimize the language (or VM), or move to another solution.  I don't think that this particular framework is so different from any other performance consideration to warrant special concern.

The work scheduling/stealing algorithm could lead to suboptimal performance if a task is stolen that requires a great deal of data to do it's work, but who doesn't require a great deal of computational time, specifically in the case of a Non-Uniform Memory Architecture (NUMA) system.  In this scenario, a disproportionate penalty will be paid to move the data from one thread/processor's context to another, without the desired boost in parallel execution performance.

Tuesday, September 29, 2009

Emacs/Featurism

Emacs is one of those systems where I've never quite understood the appeal.  I'm sure this is because I've never used it for more than open/simple edit/save, but still, it seems like a relic from an age long since past.  I think that its success can be attributed to (in addition to the architecture) the highly technically competent user base, along with the lack of other tools to get similar jobs done.  This created a bit of a vacuum of power, and acted like a lens, focusing the technical expertise of its age upon itself.  If it were not open source, I don't think that a system like this could evolve.  Firstly, it would be scarce to find an organization with enough technical expertise across a large enough base of users to get beneficial functionality built.  Secondly, as we have learned, software tends to model the organization that built it, so likely an organization large enough to have the necessary complement of technical expertise would also likely have a great deal of bureaucracy, leading to software with a great deal of bureaucracy, and therefore considerably more difficult interface contracts.  Additionally, smart (in the traditional sense) organizations try to reduce duplication of effort, so they would likely not encourage experimentation to the degree necessary for fear that they would be paying two people to do the same thing twice.

The architecture of Emacs was very forward looking for its time.  In an age where static languages and mainframes were king, Emacs dared to build an expressive and remarkably high level controller language in the form of Emacs Lisp.  Emacs benefited from the simplistic terminal model prevalent at the time, in that complex user interactivity (such as is demanded today) was not common, so a very simple model could emerge.  Furthermore, the pace of technology at that time was such that vast arrays of tools could be built upon that model with little demand for change in the underlying model--hence it had a stable platform.  These attributes are turning into disadvantages in today's society, where advanced user interactivity and graphical interfaces are demanded.  Emacs has stretched its usefulness far beyond what I would have foreseen, but I think its about at its breaking point.  As the user experience at large continues to become more powerful, glossy, and user friendly, the sharp learning curve that Emacs presents, along with its antiquated model will inevitably lead to a distinct lack of future demand, and therefore a slow path to obsolescence, as its aging user base exits the mainstream of software development and technology.

As I have stated, Emacs benefited from a very simple model.  This worked for Emacs, because at its conception, the very simple model, along with a small complement of enhancements, was all that was warranted. If we were to start to build a system today aimed at providing a feature set comparable to Emacs, it would be totally impossible to start in the same way.  Sure, MVC is still a good choice, but users (of Emacs) accept its primitive interface as the result of a long evolution, and a new application would not gain such a nostalgic acceptance.  Today's interactivity models require fundamentally more complex abstractions, and therefore growing a system from such a primitive design wouldn't work.

Avoiding complexity in implementation is a touchy subject, something that I have been grappling with in the past few weeks.  As I am designing a system that requires a great deal of overall complexity, the primary questions are: How complex does the design need to be? and Where should this complexity lie?  One concern that I've had recently is that as you iterate through an application's design, one has the tendency to shovel complexity around, in order to reduce the complexity of the current subsystem.  If not iterated on sufficiently, the last subsystems to be visited will end up resembling the "Sweeping it under the rug" pattern, from the "Big Ball of Mud" paper.  Whereas if complexity had been accepted at each level to the extent it deserved, the system would be balanced in complexity, and separation of concerns would be more clear, the desire to simplify each subsystem to the utmost leaves us with one or two complexity catchall subsystems, where the dirty laundry of all other subsystems are quietly marshaled, where they fester into an ugly tumor of the system.  This is not to say that we should make systems complex for complexity's sake, but rather to not fear complexity at each step of the way.

As for Firefox replacing Emacs, I suppose that at some level, it already has (or at least the whole class of web browsers have).  As they expose an MVC type framework upon which entire applications are built, the applications are becoming more and more like Lisp functionality providing the majority of the functionality in Emacs.

OPL

The OPL paper groups patterns into a hierarchy, targeted at the level of abstraction, and therefore the proposed target audience of the pattern.  As is alluded to, however, many developers must be "tall and skinny" in today's world where appropriate frameworks are not available.  This organization may prove more effective as time progresses, and more relevant frameworks emerge.

As for the granularity of the patterns, I think it would be premature to start to generate a detailed encyclopedia of parallel patterns, as the applicability of patterns at a pragmatic and proactive level hasn't yet been addressed. OPL forms a good categorization of these patterns from which further refinement can be tackled.

I think that parallelism in general has been overlooked by application-level programmers for years for a number of reasons, some relating to skill, but many relating to business pressures.  Often times very aggressive release deadlines coupled with extensive feature sets, taken in conjunction with uncertainty about the future of the software lead to a mentality about software architecture that I would paraphrase as the following (perhaps to be called the pragmatic and overworked programmer's manifesto):
I have nowhere near enough time to build this system "right".  In fact, I'm not sure that I really know what "right" is anymore, given the eternal chain of short-deadline projects that I've worked on over my professional career.  I've developed my skills to build systems at a breakneck pace, giving little thought to the overall performance or long term viability of the application's architecture.  Since programmer time is expensive compared to the ever cheaper machine time, management doesn't want me to spend any time "optimizing" something that they feel they can just "throw more hardware at".  Furthermore, no one really knows if this software is going to see version 2.0.  If I don't get version 1.0 out the door, we certainly won't, and perhaps regardless of my best intentions, this program may fail to meet the users needs, or the users may not have had these needs to the extent to make this system viable in the first place. Oh well, if it's a success, we can always go back and "make it right"...or not...who wants to mess with code that's "good enough".  Until a competitor comes along that shows the users that this type of software can be way faster than our's, the users will generally learn to accept the performance that this software offers, so why waste time parallelizing it?
 The problem with the above manifesto is that although time-to-market can make or break a business, lack of scalability can do the same.  If the software succeeds, by the time the programmer goes back to "make it right", they'll have users beating down their door for more features.  In the end, if they really *have* to optimize through parallelization, they'll likely find a hack of a solution, which doesn't work in the abstract case, but causes few enough issues to suffice.  This will make the application brittle to future changes.

It's my opinion that as frameworks evolve, and programming languages are made more expressive and declarative, more parallelism will be implicit.  At some point, these frameworks for parallelism and overall application architecture will become powerful and easy enough where a break-even point will be reached for programmers to either implement the system in a break-neck, no time to think about architecture manner, and  a paradigm where most of the functionality falls out of having the appropriate architecture.  At that point, we'll start to see a migration towards parallel frameworks.

Parallelism has become a more recent problem as Moore's law has stopped applying principally to clock speeds, and moved towards concurrent power.  All of a sudden, developers who were banking on the next generation of clock-speed-faster machines to make their single-threaded software run faster have been left out in the cold to wonder about how they can retrofit parallelism--not an easy task for a complex system.

Metacircular JVM

Metacircular VMs provide a very interesting model for reusability as well as performance.  By implementing the VM in the language hosted inside of the VM, there is a co-evolution of features in the VM and in applications, and the VM itself can benefit from the features it is intended to provide.  It's a bit hard to wrap your head around--a bit of a proverbial chicken-and-egg at times, but the bootstrapping process works out these kinks.

This bootstrapping process is probably one of the only disadvantages of such an architecture.  Whereas a VM built in natively compiled language would "just run", there is a complex process of image generation and layout that has to be worked through.

As for the threading model, it would certainly be advantageous to gain advantage of the maximal threading performance available, likely through the thinnest abstraction from the kernel as possible, but as is mentioned, there are some circumstances where the JVM knows more about whats going on (such as the mentioned "uncontended locking), and therefore can achieve greater efficiencies.  Therefore, I believe that a pluggable threading model is advantageous, such that the JVM can be tailored to the situation when necessary.

Wednesday, September 23, 2009

AOM

I think that AOMS is a terrific architectural style!  I've developed one system in the past that was a simple, but near textbook instance of such a style.  The decision to build the system this way was based upon the reality that due to politics and indecisiveness of the client (don't get me started!), the schema for a traditional data management solution couldn't be decided totally up front--the requirements would shift drastically from day to day.  I experienced the true power of such an architecture when I was able to update the domain model, and along with the automatic web UI generation that I had in place, have an updated application ready for review within 15 minutes of the conclusion of the conference call detailing said changes.  I'm also currently developing a much larger system, which exhibits some much more advanced concepts from this architectural style, designed to allow the system to be re-configured at runtime to account for changes to rules and data both in the existing sub-domains, but also to extend to (similar), but yet to be thought of domains.

One of my primary concerns as systems like these grow is performance.  I'm using a relational database backend, which has been optimized by the vendor to provide very efficient operations on sets of data defined in database schemas.  When you model the domain at a higher level of abstraction, you break a great deal of the anticipated locality of data, and spread out reads across a much larger set of rows.  Time will tell what type of optimizations will be necessary to keep a system like this performant as it scales to a large number of concurrent users.

For an Object to change its TypeObject, it may be necessary for some translation logic to occur. You can imagine a scenario where such a conversion may fail--if the destination TypeObject has a different set of properties, some required properties may need to be provided, and existing properties may be lost if they don't adhere to the specifications.  Additionally, there may be business rules relating the existence of various objects, so perhaps the lack of existence of the source object would be a problem...perhaps there are cardinality constraints on the destination object, and even more confusing, what if there were validation logic that could only be executed at a certain point in time...perhaps a function that depends on the current date/time.  In this scenario I don't know that you can say in the general case, an Object CAN change its TypeObject.  In absence of other constraints, sure, its just a mapping of properties, but the real power of such an architecture is to be able to model complex rules, and not know about these rules at compile time.

Versioning in an AOM architecture comes in two flavors--data versioning and model versioning.  Data versioning can be handled by a common set of functionality across all objects by storing updates as new values, with associated timestamps, and upon retrieval, always retrieving the most recent data.  Model versioning can be achieved through translation of "V1" domain objects to "V2" domain objects, or simply by allowing these two different versions of the same type of object to exist in the system at the same time.

The "explosion of new types" won't happen at the code level, due to the nature of modeling the domain in data, but it can happen in the model itself.  Controlling the growth of the model is as simple as controlling the users who are allowed to change the model, providing adequate training to these users so that they implement domain entities in the desired manner, and ensuring that appropriate communications channels are available, so that functionality isn't duplicated across users and/or time.

JPC

Emulation has advantages over VMs because the host machine's hardware architecture does not have to be the same as the guest machine's hardware architecture.  They do, however, suffer from inferior performance (as compared to VMs), due to the additional layer of indirection and translation involved.  Being implemented in Java, JPC provides the user with more confidence in isolation/security than a native C++ emulator, because of the additional layering within a hardened container runtime.  C++ is a bit closer to the hardware, however, so it would perform a bit better.

I think the JPC team is a bit naive in their assessment that a hard disk image could be loaded from anywhere on the internet, so I would see this claim as an implied performance issue.  Operating systems and applications are designed such that the disk should exhibit latencies within a certain range, and when you try to make them an order of magnitude slower, unacceptable performance is a likely outcome.

With the way JPC has been architected, implementing an emulator for another processor architecture would be considerably easier than starting from scratch--a large number of components could be reused, as JPC abstracts most operations down to their logical equivalents.

With advances in virtualization technology, I can't see myself withstanding the inferior performance of an emulator, and I don't think it would be particularly useful to run any sort of modern operating system on a mobile device--with the current (improving) state of cell data networks, I think its much more useful to access  a remote machine through a VNC or RDP type connection from a mobile phone, than to try to emulate the entire x86 hardware stack and accompanying OS.

Wednesday, September 16, 2009

7 Layer Software Burrito

Layering is a pattern that must be applied judiciously.  Not that any pattern can be applied mindlessly, but layer specifically can cause major problems if not called for.

With the right circumstances, however, layering can help a system withstand the test of time.  As access to apps on mobile devices proliferate, layered applications will benefit greatly, as they may be able to change the top one or two layers in a stack to support the new UI and communications semantics, hopefully leaving a majority of the system untouched.

I can't recall if it was in one of our readings, but a quote stands out in my mind to the effect of "show me a team boundary, and i'll show you a software boundary", in that the organization of teams tends to naturally dictate the separation of software.  Layering works great for this, because you can divide up a project into a bunch of layers that *should* have boundaries, and exploit this behavior rather than suffer from it.  I think that this is an interesting consideration to make while designing a layered architecture.

Most applications have some form of natural layering, regardless of whether they use the formal definition thereof.  Refactoring an application to use an explicitly stated layered architecture can take on radically different forms.  In application with excellent separation of concerns and modularization, adding layering may be as simple as uncovering the communications pathways, and more strictly defining the interfaces involved.  In more poorly designed applications, where spaghetti code and dubious design practices are present, layering would probably best be left for the inevitable redesign/recode, though analysis of the existing code and lessons learned therein would surely shed light upon a likely layer decomposition.

Xen Garden

Before Xen, the best a virtualization platform had done was to scan the code of a running virtual guest, and perform a translation of instructions that were incompatible with the virtualization concept into ones that matched the paradigm.  As you can imagine, this introduces quite a bit of overhead.

Xen challenged this model by moving to a model of paravirtualization.  By forgoing "perfect compatibility", and modifying the guest OS's code to support an alternate set of virtualization-friendly instructions.  This takes the burden off of the hypervisor at run time, and also lets the developers make smarter choices than a one-size-fits-all instruction translator could do (especially when performance is already an issue--it would be counterproductive to run an expensive algorithm at runtime to optimize runtime performance!).

Xen's architecture was built on the premise of mutual distrust, and intended to challenge the notion that hardware-emulating type virtualization platforms had done before.

Xen uses "domains" to host guest operating systems, and Domain 0 is a special domain that handles semi-privileged tasks, removing them from the core of the hypervisor, and facilitating easier development and testing.

With processors supporting virtualization natively now, the big task was to start handling exceptions that indicate that instructions have been executed from an illegal context, and figure out how to execute them in a virtualization-friendly manner.  The fact that Xen is open source was invaluable in this case, because engineers from Intel and AMD, who had the most intimate knowledge of the specific processor's featureset, could directly contribute code.

The IOMMU extends the reach of hardware directly into the virtual environment by providing an abstraction layer through which access permission could be granted, while maintaining the integrity of the virtual environments.

Tuesday, September 15, 2009

Pipes & Filters

While I haven't done too much with the UNIX style pipes and filters, I have built a system that performs a series of data transformations and applies business logic in a fashion akin to a pipes and filters paradigm, though due to the technologies involved, it wasn't explicitly viewed as such.

In the same system, it would have been hard to strictly segregate based on a pipes and filters model due to the diversity of errors that can occur, and the need to handle them carefully, and respond in different ways to different compound conditions in the system.

As far as parallelization goes, I believe that the primary criteria for selecting the pipes and filters pattern would be the incremental nature of data processing and commonality of data representation.  If all data flows can be modeled such that the necessary input buffer is very small for the filter to perform work, and a common format can be agreed upon for all pipes that make the "glue logic" overhead minimal, there will likely be parallelization gains.

Active filters would be best in scenarios where large flows of data are likely to occur at irregular intervals, and parallelization gains are desired.  In these cases, having filters ready and able to process input when it becomes available makes good sense.  Passive filters would work better in a subsystem type environment, where a processing pipeline exists to process some type of request, but which is not the primary work of the system, and therefore not worth the active process overhead.

Awww...look at our little data, all grown up, and interacting with the real world!

In contrast with some other classmates/bloggers, I've only recently even signed up for Facebook....I've been a real social networking laggard, generally seeing most features as time-wasters, but I certainly can't argue with its popularity.  It has provided a leverageable platform for application developers to reuse and extend, and has provided a great benefit in facilitating reach to end users.

The architecture of Facebook is interesting to me because it has to deal with some fundamentally challenging issues of trust and privacy, as well as weaving these issues into the constantly changing landscape that is the Web, and I think that the Facebook engineers have done an admirable job.

Obviously, Facebook realized that they couldn't keep up with the diversity and pace of users' desires, and hence the 3rd party application system is the natural way to continue growing the business without having to bear the ongoing brunt of innovation.

The archictecture they put in place to support this is quite clever, and obviously has proven its mettle, as can be witnessed by the abundance of 3rd party content/providers, and users consuming that content.

FQL supported this both by bringing a more familiar "query" paradigm to the API calls, and also by exploiting operation batching to some degree.

The 3rd party application model is an interesting and challenging one, because typically applications are trusted with the content they must display, but in this situation its not the case--similar to asking a taxi driver to drive you to a secret location by blindfolding them! Facebook made this possible, however, by changing the model of data access.  With FBML, the application didn't so much process the data, as declare what data should be processed and presented--essentially offloading some of the application's runtime into the Facebook environment.

As the power of JavaScript has grown, its ever growing necessity is inevitable in the 3rd party application environment.  Instead of taking an Apple iPhone approach, and reviewing/inspecting and then approving or denying entry, Facebook made the model much more open.  Instead of jumping through bureaucratic hoops of a review process, developers must jump through technical hoops, by structuring their browser code within a restricted framework.  For a site like Facebook, I think this was a smart choice, because of the rapid pace of such content.

Facebook's architecture gives us a glimpse into one possible future of how networked applications are built.  Whereas traditionally the application/developer has been given a great deal of free reign over all pertinent data in a process, Facebook's architecture makes us rethink the validity and sustainability of this paradigm--if applications/developers have access to all data they process, how hesitant are we going to be to provide access to this data, thereby limiting future growth?  If, on the other hand, a method of computation and composition can be extended, such as the Facebook architecture, wherein sensitive data is kept within tightly controlled confines, the future of such applications is certainly bright!

Thursday, September 10, 2009

Christopher Alexander & Patterns

I thought these chapters were very interesting!  Over the past few days I've been looking over some of my wife's architecture books (she's an Interior Design student), and I can see more and more parallels between the two.  Initially, I had seen these parallels more in regards to building large public buildings, and the various disciplines, stakeholders, and the various forces involved.  Christopher Alexander has made me see how architecture and patterns relate to a much wider range of situations.

A pattern language, either in software or in physical architecture, provides a mechanism for describing a facet of design, it's purpose, uses, strengths, motivations, applications, and implementation.  Pattern languages help break down tasks (once again, in either physical or software architecture), and provide templated building blocks, which which the designer can go about building a system/structure with proven quality attributes.  Alexander stresses that these pattern languages can help designers match context, function, and form of various facets to the overall whole, and provides a framework for organic growth, wherein a pattern doesn't provide a restriction, but rather a pathway of inquiry.

I think that physical architecture patterns differ from software patterns in that they are more immutable than software patterns.  Whereas physical architecture is subject to the laws of physics--statics, optics, etc., software is subjected to ever changing set of fundamental paradigms.  Though ideas come in and out of vogue in both fields, I think that software patterns have to be more flexible and complex, because even though the author talks about buildings, towns, etc., coming to being through organic processes, there is still a relatively static end product--you may build a house based on patterns, but you're not that likely to start shuffling the windows around once it's built.  Software, on the other hand, is always changing, as requirements evolve, and the business environment changes, software is expected to evolve gracefully to meet such needs.  This differs from the evolution of a town in that whereas in a town, you may build new things to meet current needs while being contextually relevant, you don't undertake as sweeping changes with a town as you might with a software system--you don't talk about lifting up a whole town, and placing it on a more modern sub-grade--whereas with software, upgrading the platform and paradigms is an ongoing challenge and requirement.

I think that (physical) architectural patterns are timeless to the extent that physics and basic human nature is immutable.  Some patterns which relate to the current state of technology or lifestyle (such as barns), not so much.  I think that even in software patterns, there are some elements of timelessness, perhaps not in the exact wording and digital world-view that was prevalent at the time of writing, but in the fundamental decomposition of ideas and intentions, certain aspects of software patterns, too, will live forever.

RESTing up for Ch5

REST to me seems like a mapping of a relational database to the world of the web. Only 4 verbs?  That's fine...SQL's done just fine with SELECT, INSERT, UPDATE, and DELETE for years.  REST implies that you can layer on an arbitrary level of complexity in the type of query logic you can express, so I don't see it being restrictive there...

Really, aren't all systems "data" oriented?  Whether you consider "data" to be the state of your object graph, or fields in a database table, the purpose of software in general is to manage data.  Sometimes it's de-emphasized and subordinated to the workflows or behaviors that a system presents, but thats just semantics.

While I can see that it could be useful to always go to the same URL for the same data, I'm not sure how well this would work for a lot of compound querying.  It goes back to invoking behaviors, but at some point, the system is going to exhibit behaviors, and they have to be invoked somehow, so whether you consider it to be a type of query on one of the types of data involved, or a separate contract, there are still going to be situations where you need to do something that involves a bunch of different types of data, and return a bunch of different types of data.  Now where should we implement such an operation?  I don't think that REST makes this very clear.

In fact, I think that what REST tries to do has been a common theme throughout the evolution of the software world.  Someone will come up with a system that represents something well, and then through time, extensions will be added to it until it can do everything that everyone wants, by which point everyone considers it bloated, and starts looking towards a "simpler" format to accomplish the current view of what the core functions are, but in the process, they abandon a lot of power that has been built into previous systems/formats.  I think REST falls into this same trap.  Sure, it makes the idea of finding the same data in the same place, and offers some benefits such as architectural memoization, but falls short at specifying how query/selection logic, as well as behavior invocation should be handled.  So the next step is that someone will develop some extension that standardizes that, and we'll be further away from the utopian view of what REST should be, and not a whole lot better off than we were with web services.

Don't get me wrong, each new such technology leverages advances throughout computing, and the evolved system is usually better than its predecessors, but I don't know why each such advancement has to be a paradigm shift.  Why can't we just go back and revise existing standards and trim the fat and extend towards the future?

This chapter emphasizes "not being tied to certain infrastructure", but the reality of things is that you probably will be tied to quite a bit.  You'll end up building your system on a set of libraries and frameworks, and with certain methodologies which work nicely with your current technology stack, but if you were to want to make  a serious infrastructure change, it'll be no easy task no matter what.  Wasn't the purpose of the IP protocol so that different systems with different physical architectures, scales, and purposes could all talk to one another in a common method?  In the same way that IP has grown into a bunch of different vendor specific formats, etc. such shall REST, and as most things, I doubt it will live up to the hype that it has presented.

From a pragmatic view, it's just another different perspective on data, and one more layer of indirection in the world to resolve resources.

Tuesday, September 8, 2009

BA Ch 4 & ArchJava

Both the BA Ch4 and the ArchJava paper relate to a quite elusive concept (at least to me) in the modern world of software, which perhaps is just an embodiment of software architecture in general, but whose specific instances really hit home for me: How do you take a set of (potentially) idiomatically distinct libraries, frameworks, and ideologies, and merge them with the ideal view of how software should fit together?

Disclaimer: I can't speak from a Java perspective, as I somehow managed to get my whole BS in CS without writing a single line of Java, so I'm going to have to use .NET technologies as a close substitute.

ArchJava approaches this question from the standpoint of language enhancements on top of Java, where they ask for any code you write to be reworked within a paradigm of a component hierarchy, ports, and connections.  One big problem I have with this, and with many idealistic patterns and practices, is that they often times fail to recognize and leverage the existing infrastructure that constituent software brings to the table.  For example, using something like ArchJava, writing an ASP.NET page, which already has its own concept of component (or control) hierarchies and communications strategies, I'm not quite sure where we're left with an ArchJava type solution.  You might say that we'll consider the whole ASP.NET subsystem to be excluded from the ArchJava-type code model, but then as soon as you try to perform extension in a few different ways (such as control subclassing or HTTP pipeline processing addins), you break that scope, and it seems that only the much less strict, and less powerful mode of ArchJava can be used.

The chapter "Architecting for Scale" grapples with this question, though it doesn't provide extensive detail on a proposed ideal solution to the extent that ArchJava does.  I found the Form/Binding paradigm to be very interesting, in that it goes a long way to leverage existing technologies (such as SWING), but generates a wrapper/abstraction that allows the developers to work within the application's paradigm. The buy vs. build decisions also play into my question, as the "roll your own" is guaranteed to be much closer to the ideal architectural style, though potentially less robust, and often more expensive.

One thing that, being from the .NET world, I see as paramount, is the design time support for these tools and paradigms.  In ArchJava, for instance, the proposed solution supplants the standard compiler for an  enhanced version.  This is all well and good, and can easily integrate with a build flow, but at design time, the developer is left with constructs which, while they may compile with the extended compiler, are not (necessarily) intrinsically supported by things like code coloring, refactoring tools, intellisense, and perhaps code analysis tools.  In my opinion, these shortcomings make this type of tool more bleeding-edge and disruptive--perhaps to a greater degree than the benefit they provide.

Thursday, September 3, 2009

Beautiful(?) Darkstar (BA Ch 3)

The intent behind Project Darkstar has great merits--anywhere you can bring code reuse to make building software easier, cheaper, faster, and more stable is a worthwhile endeavor.  The architecture laid forth provides a good framework for developers to build from, but I believe that it falls short of its intention.  Wheras it was designed to be a "black box" type platform, where developers wouldn't have to understand its inner workings, it really only succeeds as a starting codebase.  It's well known that you can't (in the general case) take a standard algorithm and run it in a distributed or concurrent fashion (as the team found out), and as such, it misses the mark for its overall intention.  In the example where a team had build a data model on Darkstar with a "coordinator" object, it's clear that there was some other useful paradigm in play.  Perhaps the Darkstar team, instead of coaching the game team on how to make their game work better on Darkstar, should have looked at the reason for using such an object, and performed some analysis to see if they could build a Darkstar-compatible base class or method library, such that the poorly performant piece could be reworked to function in the general case, and provide some additional utility to game developers.

If I were on this team from the beginning, my biggest concern would be with potential performance.  I would advocate establishing benchmarks from other games in the industry--even if I couldn't know the workings of what other game engines were doing--but at least to quantify what response times are expected for what types of operations.  From there, I would have made incremental performance tests under various load conditions as part of the standard automated testing suite, such that performance bottlenecks could be identified as early as possible after implementation.