The generic instruction cycle for an unspecified CPU consists of the following stages:
An example of a full instruction cycle is provided by the following VAX instruction, which uses memory addresses for all three operands.
mull x, y, product
y, storing the result in a CPU register.
Since the MIPS is a load-store architecture, all instructions except load and store get their operands from CPU registers and store their result in a CPU register. Hence, the instruction cycle for all instructions except load and store is somewhat simpler. When all operands are in CPU registers, which can be accessed within a single clock cycle, fetching operands and storing the results can occur within the same clock cycle as execution (add, subtract, etc.). For example, suppose R0, R1, R2 ... R15 are CPU registers. Then the operation
R0 ← R4 + R7 # One clock cycle
is a simple, atomic operation inside the CPU, and therefore is not regarded as multiple steps in the instruction cycle. If one of operands were in memory instead of a register, on the other hand, fetching it from memory and placing it into a register would be a separate step.
R4 ← Mem[address1] # Multiple clock cycles R0 ← R4 + R7 # One clock cycle
The specific cycle for a load instruction is:
The specific cycle for a store instruction is:
Note that in any case, most of the instruction cycle is overhead. Only the execute stage actually does something considered useful by the user, and all the other stages are fluff, either preparation or wrap-up.
One way to increase the density of useful work in a program is
by making more complex instructions. If the execute cycle
accomplishes more for the same amount of fetching, decoding
and storing overhead, then the program will be shorter, and
will run faster. This is the philosophy behind CISC architectures.
A classic example of this idea is the VAX
instruction, which evaluates a polynomial given an array of
coefficients, the order or the polynomial, and the value of x.
It accomplishes in one instruction cycle what would require a
loop, and hence dozens of instruction cycles otherwise.
The cost of overhead can also be alleviated without actually reducing it. The primary technique to achieve this is called pipelining. A pipelined CPU overlaps the execution of two or more instructions, so that while one instruction is executing, the next one is already being decoded, and the one after that is being fetched. Pipelining is discussed in Chapter 17, A Pipelined Implementation.