Python GIL, threading and multicore hardware

Guido Van Rossum relayed this very insightful talk about the innards of Python's global interpreter lock by David Beazley (blip.tv link).

My position with regards to the GIL is pretty standard: like everyone else I'd rather do without and benefit from true threading concurrency on multicore hardware, but I understand it's a hard problem to solve, and therefore it's either here to stay, or getting rid of it would mean fairly drastic changes to the underlying interpreter. The only non-GIL python platform I'm aware of is Jython (in a nutshell, the Python language running on the Java Virtual Machine).

Beazley however showed that it's worse than everyone thinks.

How do Stackless python's Microthreads compare? Stackless is bound by the GIL also, but it's Microthreads are not done at OS level (see Green Threads). Understandably those should not be affected by GIL battles.

We have a lot of code using the Twisted framework that fall into the 'mix of CPU bound and I/O bound threads' that Beazley describes. We typically offload blocking I/O with a deferToThread call (to memcached, to a database, to another XMLRPC, whatever). All our hardware is multicore (typically 8 cores per system), and we run multiple twisted instances to achieve scalability. Now I'm wondering if we assign each twisted + child threads to their own CPU, are we going to see better performance because we'll be avoiding GIL battles?