Speedup

In computer architecture, speedup is a process for increasing the performance between two systems processing the same problem. More technically, it is the improvement in speed of execution of a task executed on two similar architectures with different resources. The notion of speedup was established by Amdahl's law, which was particularly focused on parallel processing. However, speedup can be used more generally to show the effect on performance after any resource enhancement.

Speedup can be defined for two different types of quantities: latency and throughput.

Latency of an architecture is the reciprocal of the execution speed of a task:

where

Throughput of an architecture is the execution rate of a task:

where

Latency is often measured in seconds per unit of execution workload. Throughput is often measured in units of execution workload per second. Another unit of throughput is instructions per cycle (IPC) and its reciprocal, cycles per instruction (CPI) is another of unit of latency.

Speedup is dimensionless and defined differently for each type of quantity so that it is a consistent metric.

Speedup in latency is defined by the following formula:

where

Speedup in latency can be predicted from Amdahl's law or Gustafson's law.

Speedup in throughput is defined by the following formula:

where

We are testing the effectiveness of a branch predictor on the execution of a program. First, we execute the program with the standard branch predictor on the processor, which yields an execution time of 2.25 seconds. Next, we execute the program with our modified (and hopefully improved) branch predictor on the same processor, which produces an execution time of 1.50 seconds. In both cases the execution workload is the same. Using our speedup formula, we know

Our new branch predictor has provided a 1.5x speedup over the original.

We can also measure speedup in cycles per instruction (CPI) which is a latency. First, we execute the program with the standard branch predictor, which yields a CPI of 3. Next, we execute the program with our modified branch predictor, which yields a CPI of 2. In both cases the execution workload is the same and both architectures are not pipelined nor parallel. Using the speedup formula gives

...
Wikipedia