What do all those transistors do?
The CPU in your laptop or desktop has a lot of transistors in it. The Core i7-6700HQ that I'm typing this on has 1.35 billion of the little guys. Buck back in the day on of the earliest computers, ENIAC, had only 20 thousand vacuum tubes which more or less fufilled the same role as transistors do now. So what do all those extra transistors we've added accomplish, if we were able to do useful mathematical operations with just 20,000? Most of the increase in the speed at which we run computers, the clock rate, has come from replacing large and slow transistors with smaller and faster transistors after all.
Well first, what was ENIAC doing with it's transistors? A single transistor isn't very useful. If you're willing to use a resistor too you can perform an operation like making an output the logical and of one input and the inverse of a second input, call it AND(A, NOT(B)). But that circuit is a very jury rigged thing which will be slow, unreliable, and fairly power hunger. You can make much more solid and-ish circuit, NOT(AND(A,B)) or NAND, from a couple of transistors and a resistor if you don't mind being power hungry. But given that Moore's Law has made transistors cheaper and cheaper relative to resistors anybody these days would use a four transistor NAND circuit which is faster and more power efficient than the resistor and transistor equivalent.
What if you want to add two numbers together? Digital circuits output zeros and ones so first you have to decide how many base 2 digits you want to be able to represent since you need enough output wires for the largest number the machine can handle. In a modern all-transistor design you'll be using 16 transistor for every bit of the addition. On a 32 bit adder which many processors used around 2000 and which is roughly equivalent to what ENIAC used this'll be 512 total.
But you don't want a computer that only adds numbers. You want a wide variety of instructions you can execute, you want some way of choosing what instruction you execute next, and you want to interact with memory. At this point you're up to 10,000s of transistors. That will give you a CPU that can do all the things ENIAC could do.
Now lets say you don't want your entire operating system to crash when there is a bug in any program that you run. This involves more transistors. And you probably want to be able to start one multi-cycle instruction before that last one finishes (pipelining). This might get you up to executing one instruction every other clock cycle on average. That'll cost transistors as well. This will grow your chip up to 100,000s of transistors and will give you performance like the Intel 386 form the mid 80s.
But this will still seem very slow compared to the computers we use nowadays. You want to be able to execute more than one instruction at a time. Doing that isn't very hard but figuring out which instructions can be executed in parallel and still give you the right result is actually very hard and takes a lot of transistors to do well. This is what we call out of order execution like what the first Intel Pentium Pro had in the mid 90s and it will take about 10 million transistors in total.
But now the size of the pool of memory that we're working with is getting bigger and bigger. Most people these days have gigabytes of memory in their computers. The bigger the pool is the longer it takes to grab any arbitrary byte from it. So what we do is have a series of pools, a very fast 10kB one, a slightly slower 100kB, a big 10MB one on the chip, and then finally your 8GB of main memory. And we have the chip figure out what data to put where so that the most of the time when we go to look for some data it's in the nearby small pool and doesn't take very long to get and we're only waiting to hear back from main memory occasionally. This and growing the structures that look forward for more instruction to execute are how computers changed until the mid 2000s. Also going from 32 to 64 bits so that they could refer to more than 4GB of memory, the biggest number you can say in only 32 bits is 4294967296 so any memory location over that number couldn't be used by a 32 bit computer. This'll get us up to 100 million transistors.
And from the mid 2000s to the mid 2010s we've made the structures that figure out which instructions to execute next even bigger and more complicated letting us execute even more instructions at once. As we grow performance this way the number of transistors we needs grows as the square of the performance, on average. And we've added more cores on the same chips letting us grow performance linearly with transistors as long as software people can figure out ways to actually use all the cores. And now we're up to billions of transistors.
All this raises the question of whether you could just take a design 10,000 transistor cores you would have used back in the day and put 100 of those cores in a CPU instead of the 4 you'd normally buy. To some extent you can do that. If you want to have them all talk to each other and with a good amount of memory you have to increase their width to 64 bits but that doesn't take so very many transistors if the rest of the design stays simple. And they'll be slower individually but each of the 10x transistor steps causes something like a doubling of performance rather than a 10x increase in performance. The problem is that it's hard to write software in such a way the work can be broken up neatly into 100 different threads of execution. And some operating systems, such as Windows, tend to problems dividing up work efficiently between more than 30 or so threads.
In theory you could have 2 large cores for cases where the software doesn't support a lot of division of work and another 30 tiny cores to handle cases where the work can be divided easily. The operating system would have to be aware of this, though, and prefer running tasks on the fast cores first before less tasks trickle down to the slower cores. Something of the sort has been done on mobile phones where you might have 2 fast cores and 4 slow cores. But on your laptop you mostly see this sort of thing with a few large cores running your applications and a gaggle of small cores in the GPU doing the graphics processing.
Well first, what was ENIAC doing with it's transistors? A single transistor isn't very useful. If you're willing to use a resistor too you can perform an operation like making an output the logical and of one input and the inverse of a second input, call it AND(A, NOT(B)). But that circuit is a very jury rigged thing which will be slow, unreliable, and fairly power hunger. You can make much more solid and-ish circuit, NOT(AND(A,B)) or NAND, from a couple of transistors and a resistor if you don't mind being power hungry. But given that Moore's Law has made transistors cheaper and cheaper relative to resistors anybody these days would use a four transistor NAND circuit which is faster and more power efficient than the resistor and transistor equivalent.
What if you want to add two numbers together? Digital circuits output zeros and ones so first you have to decide how many base 2 digits you want to be able to represent since you need enough output wires for the largest number the machine can handle. In a modern all-transistor design you'll be using 16 transistor for every bit of the addition. On a 32 bit adder which many processors used around 2000 and which is roughly equivalent to what ENIAC used this'll be 512 total.
But you don't want a computer that only adds numbers. You want a wide variety of instructions you can execute, you want some way of choosing what instruction you execute next, and you want to interact with memory. At this point you're up to 10,000s of transistors. That will give you a CPU that can do all the things ENIAC could do.
Now lets say you don't want your entire operating system to crash when there is a bug in any program that you run. This involves more transistors. And you probably want to be able to start one multi-cycle instruction before that last one finishes (pipelining). This might get you up to executing one instruction every other clock cycle on average. That'll cost transistors as well. This will grow your chip up to 100,000s of transistors and will give you performance like the Intel 386 form the mid 80s.
But this will still seem very slow compared to the computers we use nowadays. You want to be able to execute more than one instruction at a time. Doing that isn't very hard but figuring out which instructions can be executed in parallel and still give you the right result is actually very hard and takes a lot of transistors to do well. This is what we call out of order execution like what the first Intel Pentium Pro had in the mid 90s and it will take about 10 million transistors in total.
But now the size of the pool of memory that we're working with is getting bigger and bigger. Most people these days have gigabytes of memory in their computers. The bigger the pool is the longer it takes to grab any arbitrary byte from it. So what we do is have a series of pools, a very fast 10kB one, a slightly slower 100kB, a big 10MB one on the chip, and then finally your 8GB of main memory. And we have the chip figure out what data to put where so that the most of the time when we go to look for some data it's in the nearby small pool and doesn't take very long to get and we're only waiting to hear back from main memory occasionally. This and growing the structures that look forward for more instruction to execute are how computers changed until the mid 2000s. Also going from 32 to 64 bits so that they could refer to more than 4GB of memory, the biggest number you can say in only 32 bits is 4294967296 so any memory location over that number couldn't be used by a 32 bit computer. This'll get us up to 100 million transistors.
And from the mid 2000s to the mid 2010s we've made the structures that figure out which instructions to execute next even bigger and more complicated letting us execute even more instructions at once. As we grow performance this way the number of transistors we needs grows as the square of the performance, on average. And we've added more cores on the same chips letting us grow performance linearly with transistors as long as software people can figure out ways to actually use all the cores. And now we're up to billions of transistors.
All this raises the question of whether you could just take a design 10,000 transistor cores you would have used back in the day and put 100 of those cores in a CPU instead of the 4 you'd normally buy. To some extent you can do that. If you want to have them all talk to each other and with a good amount of memory you have to increase their width to 64 bits but that doesn't take so very many transistors if the rest of the design stays simple. And they'll be slower individually but each of the 10x transistor steps causes something like a doubling of performance rather than a 10x increase in performance. The problem is that it's hard to write software in such a way the work can be broken up neatly into 100 different threads of execution. And some operating systems, such as Windows, tend to problems dividing up work efficiently between more than 30 or so threads.
In theory you could have 2 large cores for cases where the software doesn't support a lot of division of work and another 30 tiny cores to handle cases where the work can be divided easily. The operating system would have to be aware of this, though, and prefer running tasks on the fast cores first before less tasks trickle down to the slower cores. Something of the sort has been done on mobile phones where you might have 2 fast cores and 4 slow cores. But on your laptop you mostly see this sort of thing with a few large cores running your applications and a gaggle of small cores in the GPU doing the graphics processing.
Comments
Post a Comment