I haven't had a great deal of experience with testing parallel programs. As I believe I've stated in my prior blogs, most of the applications that I've built have used totally isolated processing threads, accessing common resource only through a database server--where all but the general awareness of concurrency is abstracted away. In the couple of instances where I have tested the concurrent nature of these programs, I've generally only used stress testing.
While Moore's Law may be one of the most widely known conjectures related to computing, I'd argue that it is still trumped by Murphy's Law: "Anything that can go wrong, will go wrong". When we write test suites for an application, we're hoping that the "will go wrong" will rear its ugly head in the very finite amount of time allotted. Even stress tests running on large farms of machines pale in comparison to the diversity of executions that will happen "in the wild", across an expansive space of hardware and software platforms, user interactions, environmental factors, and configurations, and possibly across years or even decades of deployment. Hence, even if a stress test has run for weeks without failing in a test environment cannot purport to truly capture all behaviors. Additionally, since any code change, either in the application, or in the platform, can totally invalidate a stress test, even the most exhaustive practical stress test is only good for as long as the identical bits are in play--and we all know how quickly requirements change and bugs are found.
The small scope hypothesis is quite an interesting proposition. To be able to bound the complexity of test cases is certainly a desirable trait from the perspective of a real-world tool, such as CHESS. I don't know that I can offer any specific critical analysis on this subject, but would rather argue that regardless of any formal proof of such a concept, code is written by people, and to the extent that they are unrestricted in the concurrency structure of the code they write, they will (again, with the Murphy's Law!). Hence, it's my belief that only through empirical study will we find out what percentage of bugs meet the criteria of this hypothesis with given parameters.
I could envision a monitor on top of CHESS that would profile the efficiency of various interleavings of execution, such that if certain interleavings are found to be particularly efficient, design work can be undertaken to try to guide the execution towards such interleavings (such as monitoring how locking patterns occur, the duration of thread sleeps, and the relative priority of threads).
The semantics of synchronization primitives being misunderstood is minimal to the extent that the conservative option is chosen (in the happens-before graph).