In computer science, program analysis is the process of automatically analyzing the behavior of computer programs regarding a property such as correctness, robustness, safety and liveness. Program analysis focuses on two major areas: program optimization and program correctness. The first focuses on improving the program’s performance while reducing the resource usage while the latter focuses on ensuring that the program does what it is supposed to do.
Program analysis can be performed without executing the program (static program analysis), during runtime (dynamic program analysis) or in a combination of both.
In the context of program correctness, static analysis can discover vulnerabilities during the development phase of the program. These vulnerabilities are easier to correct than the ones found during the testing phase since static analysis leads to the root of the vulnerability.
Due to many forms of static analysis being computationally undecidable, the mechanisms for doing it will in general not always terminate with the right answer – either because they sometimes return a false negative ("no problems found" when the code does in fact have problems) or a false positive, or because they never return the wrong answer but sometimes never terminate. Despite their limitations, the first type of mechanism might reduce the number of vulnerabilities, while the second can sometimes give strong assurance of the lack of a certain class of vulnerabilities.
Incorrect optimizations are highly undesirable. So, in the context of program optimization, there are two main strategies to handle computationally undecidable analysis:
However, there is also a third strategy that is sometimes applicable for languages that are not completely specified, such as C. An optimizing compiler is at liberty to generate code that does anything at runtime – even crashes – if it encounters source code whose semantics are unspecified by the language standard in use.
The purpose of control-flow analysis is to obtain information about which functions can be called at various points during the execution of a program. The collected information is represented by a control flow graph (CFG) where the nodes are instructions of the program and the edges represent the flow of control. By identifying code blocks and loops CFG becomes a starting point for compiler made optimizations.