Metacircular VMs provide a very interesting model for reusability as well as performance. By implementing the VM in the language hosted inside of the VM, there is a co-evolution of features in the VM and in applications, and the VM itself can benefit from the features it is intended to provide. It's a bit hard to wrap your head around--a bit of a proverbial chicken-and-egg at times, but the bootstrapping process works out these kinks.
This bootstrapping process is probably one of the only disadvantages of such an architecture. Whereas a VM built in natively compiled language would "just run", there is a complex process of image generation and layout that has to be worked through.
As for the threading model, it would certainly be advantageous to gain advantage of the maximal threading performance available, likely through the thinnest abstraction from the kernel as possible, but as is mentioned, there are some circumstances where the JVM knows more about whats going on (such as the mentioned "uncontended locking), and therefore can achieve greater efficiencies. Therefore, I believe that a pluggable threading model is advantageous, such that the JVM can be tailored to the situation when necessary.