When I first arrived at MIT I proved my commitment to big ideas by dumping two years into software extended architectures; an idea similar to the implemenatation technology of transmeta's Crusoe processor.
I put a layer of software between the binary and the hardware architecture, and used it for much more radical purposes than transmeta's simple application of running X86 binaries on their VLIW. I merged instruction streams from different different programs to improve processor throughput. The idea is that it is easier to make processors support wider (more independent instructions) rather than deeper (heavily branch predicted) instruction streams.
Here are some details in a 1997 ISCA submission, which is now an MIT tech report. .
After that submission I decided to retarget my system to the simplescalar processor simulator. I also added static analysis so the run time could make more intellegent decisions about only merging programs when they are in loops. I started to get some throughput wins (around 10%) that were scaling with independent processor resources, but went to InCert before I could completely waste my life in this quixotic pursuit.