Task parallelism seems like a pretty straightforward pattern from its title. Its content, however, is not quite what I expected. It seems to be a summary of other patterns in many regards, and its actual content seems to be more of a performance tuning guide from an architectural level than a way of structuring a solution. Most of my experience has been on the .NET framework on Windows, and I've often used the thread pool to achieve a task parallelism type of implementation. The software that I have worked on has been generally utilized large grained divisions, so I haven't run into any situations where specifically better hardware support for task queue would be necessary, however, I can see that as the progression of computer speed is pushed to the parallel rather than the clock, and algorithms need to follow suit to provide the expected speedup, this type of hardware might become necessary to be able to leverage ever-more-parallel yet still diverse install base.
In the recursive splitting, the granularity of execution is controlled by the size of the base case of recursion. In most parallel algorithms, there is a need for balance between perceived running time (in a real-time clock sense), and the number of total operations executed, as may be important in multi-user systems/servers, or when there are tight resource constraints. Finding the balance between these factors is something that's quite hard to do at the abstract level--different hardware with different parallel facilities will execute the same algorithm in very different ways, and therefore the right balance is likely dependent upon the hardware, load, and required responsiveness and resource limitation characteristics of the system. Since recursion and iteration are essentially isomorphic, any algorithm that iterates over some data or performs a series of steps could be considered recursive, though not very naturally in some cases. Hence, this pattern, taken indiscriminately, can be applied to a huge class of problems.
I haven't worked on a lot of "pure" parallel systems, so I would have to say that message passing, such as it would be viewed from a SOA perspective, is the environment I'm more familiar with. I feel that messages are a pretty natural way to separate parallel executions, because it unclutters the interactions. An example of a system where the ordering of messages matters might be in a banking application--overdraft fees are charged whenever a customer withdraws (through check, ATM, etc.) more than their account balance. If messages were received out of order, overdraft conditions would not occur in the desired or consistent order. With regards to deadlock detection, I think that it is a matter of the domain in which the algorithm is being implemented as to whether timeouts or deadlock detection is a better method. If all executions are tightly bounded with low standard deviation running times, then timeouts would be quite appropriate, as the system would be quite certain that deadlock has occurred within a reasonable amount of time. In other situations, where running times may vary greatly, and/or it's very costly to rollback a transactions unnecessarily, the cost of deadlock detection may be well worth it.