Uncontrolled format string is a type of software vulnerability discovered around 1989 that can be used in security exploits. Previously thought harmless, format string exploits can be used to crash a program or to execute harmful code. The problem stems from the use of unchecked user input as the format string parameter in certain C functions that perform formatting, such as printf()
. A malicious user may use the %s
and %x
format tokens, among others, to print data from the call stack or possibly other locations in memory. One may also write arbitrary data to arbitrary locations using the %n
format token, which commands printf()
and similar functions to write the number of bytes formatted to an address stored on the stack.
A typical exploit uses a combination of these techniques to force a program to overwrite the address of a library function or the return address on the stack with a pointer to some malicious shellcode. The padding parameters to format specifiers are used to control the number of bytes output and the %x
token is used to pop bytes from the stack until the beginning of the format string itself is reached. The start of the format string is crafted to contain the address that the %n
format token can then overwrite with the address of the malicious code to execute.
This is a common vulnerability because format bugs were previously thought harmless and resulted in vulnerabilities in many common tools. MITRE's CVE project lists roughly 500 vulnerable programs as of June 2007, and a trend analysis ranks it the 9th most-reported vulnerability type between 2001 and 2006.
Format string bugs most commonly appear when a programmer wishes to print a string containing user supplied data. The programmer may mistakenly write printf(buffer)
instead of printf("%s", buffer)
. The first version interprets buffer
as a format string, and parses any formatting instructions it may contain. The second version simply prints a string to the screen, as the programmer intended.