Buffer overflow: The weakness of attack and defense over the past decade

Abstract: In the past decade, security vulnerabilities with buffer overflow as the type have been the most common form. What's more serious is that the buffer overflow vulnerability accounts for the vast majority of remote network attacks, which can give an anonymous Internet user the opportunity to gain some or all of the control of a host! If buffer overflow vulnerabilities can be effectively eliminated, a large proportion of security threats can be mitigated. In this article, we have studied various types of buffer overflow vulnerabilities and attack methods, and at the same time, we have also studied various defense methods used to eliminate the impact of these vulnerabilities, including our own stack protection methods. Then we need to consider how to use these methods to eliminate these security vulnerabilities while ensuring that the functions and performance of the existing system remain unchanged.
1. Preface: In the past decade, security vulnerabilities with buffer overflow as the type have been the most common form. What's more serious is that the buffer overflow vulnerability accounts for the vast majority of remote network attacks, which can give an anonymous Internet user the opportunity to gain some or all of the control of a host! Since this type of attack makes it possible for anyone to gain control of the host, it represents an extremely serious security threat. The reason why buffer overflow attacks have become a common security attack method is that buffer overflow vulnerabilities are too ordinary and easy to implement. Moreover, buffer overflow has become the main means of remote attacks. The reason is that the buffer overflow vulnerability gives the attacker everything he wants: breed and execute attack code. The colonized attack code runs programs with buffer overflow vulnerabilities with certain permissions, thereby gaining control of the attacked host. For example, among the five remote attacks used by Lincoln Laboratory to evaluate intrusion detection in 1998, three were based on social engineering trust relationships, and two were buffer overflows. Of the 13 CERT recommendations in 1998, 9 were related to buffer overflow. In 1999, at least half of the suggestions were related to buffer overflow. In Bugtraq's survey, 2/3 of the respondents believed that the buffer overflow vulnerability was a very serious security issue. There are many forms of buffer overflow vulnerabilities and attacks, and we will describe and classify them in the second part. Correspondingly, the defense methods also vary according to the attack methods. We will describe them in the third part, which includes effective defense methods for each attack type. We also need to introduce a stack protection method, which is very effective in solving buffer overflow vulnerabilities and does not sacrifice system compatibility and performance. In the fourth part, we will discuss the comprehensive use of various defense methods. Finally in Part 5 is our conclusion.
2. Buffer overflow vulnerabilities and attacks. The purpose of buffer overflow attacks is to disrupt the functions of programs with certain privileges, so that the attacker can obtain control of the program. If the program has sufficient permissions, the entire host will be controlled. Generally speaking, an attacker attacks the root program and then executes executing code like "exec(sh)" to obtain the root shell, but this is not always the case. To achieve this goal, the attacker must achieve the following two goals:
1. Arrange appropriate code in the program's address space.
2. By appropriately initializing registers and memory, let the program jump to the address space we arranged for execution. We classify buffer overflow attacks based on these two targets. In Section 2.1, we will describe how the attack code is placed in the address space of the attacked program (this is the origin of the name "buffer"). In part 2.2, we introduce how an attacker overflows a program's buffer and transfers the execution to the attack code (this is the origin of "overflow". In Section 2.3, we introduce techniques to integrate the code arrangement and control program execution flow discussed in Sections 2.1 and 2.2. 2.1 Methods to arrange appropriate code in the address space of the program. There are two methods to arrange attack code in the address space of the attacked program: Collapse method: The attacker enters a string to the attacked program, and the program will put the character string into the buffer. The data contained in this string is a sequence of instructions that can be run on this attacked hardware platform. Here, the attacker uses the buffer area of the attacked program to store the attack code. There are two differences in the specific methods: 1. The attacker does not have to overflow any buffer to achieve this goal, and can find enough space to place the attack code. 2. The buffer can be set anywhere: stack (automatic variable), heap (dynamic allocated) and static data area (initialized or uninitialized data) Use existing code: Sometimes, the code the attacker wants is already in the attacked program, and all the attacker has to do is pass some parameters to the code and then make the program jump to our target. For example, the attack code requires the execution of "exec("/bin/sh")", and the code in the libc library executes "exec(arg)", where arg makes a pointer parameter pointing to a string. Then the attacker just changes the passed parameter pointer to "/bin/sh" and then transfers to the corresponding instruction sequence in the libc library. 2.2 Methods to control program transfer to attack code All these methods are seeking to change the execution process of the program and make it jump to attack code. The most basic thing is to overflow a buffer without bounds checking or other weaknesses, which disrupts the normal execution order of the program. By overflowing a buffer, the attacker can rewrite the adjacent program space in a nearly brute force and directly skip the system's inspection. The benchmark for classification here is the type of program space that the attacker seeks for buffer overflow. In principle, it is a space that can be arbitrary. For example, the original Morris Worm used the buffer overflow of the fingerd program, disrupting the name of the file to be executed by fingerd. In fact, many buffer overflows use violent methods to seek to change program pointers. The difference between this type of program is that the breakthrough in program space and the positioning of memory space are different. (Figure 1) Activation Records: Whenever a function call occurs, the caller will leave an activation record in the stack, which contains the address returned at the end of the function. The attacker overflows these automatic variables to make this return address point to the attack code, as shown in Figure 1. By changing the return address of the program, when the function call ends, the program jumps to the address set by the attacker, rather than the original address. This type of buffer overflow is called "stack smashing attack", which makes the currently commonly used buffer overflow attack methods. Function Pointers: "void (* foo)()" declares a variable foo with a return value of void function pointer. Function pointers can be used to locate any address space, so an attacker only needs to find a buffer that can overflow near the function pointer in any space, and then overflow this buffer to change the function pointer.
At some point, when the program calls a function through a function pointer, the program's flow is implemented according to the attacker's intention! One of its attack examples is the superprobe program under the Linux system. Longjmp buffers: A simple verification/recovery system is included in the C language, called setjmp/longjmp. It means setting "setjmp(buffer)" at the inspection point and using "longjmp(buffer)" to restore the inspection point. However, if the attacker can enter the buffer space, then "longjmp(buffer)" is actually jumping to the attacker's code. Like a function pointer, the longjmp buffer can point to anywhere, so all the attacker has to do is find a buffer that can overflow. A typical example is Perl 5.003. The attacker first enters the longjmp buffer zone used to recover buffer overflow, and then induces it to enter recovery mode, which makes the Perl interpreter jump to the attack code! 2.3 Comprehensive code breeding and process control technology Now we are studying the technologies of comprehensive code breeding and process control. The simplest and common type of buffer overflow attack is to integrate code in a string and activation records. The attacker locates an automatic variable that can overflow, and then passes a large string to the program, breeding the code while triggering a buffer overflow change and activating the record. This is a template for the attack pointed out by Levy. Because C is used to only open up small buffers for users and parameters, there are many examples of such vulnerability attacks. Code infusion and buffer overflow do not have to be completed within one action. Attackers can place code in a buffer, which cannot overflow the buffer. The attacker then transfers the program's pointer by overflowing another buffer. This method is generally used to solve the situation where the overflowing buffer is not large enough (all code cannot be put down). If an attacker tries to use code that is already resident instead of fertilizing code from outside, they usually have to parameterize the code. For example, some code segments in libc (almost all C programs require it to connect) will execute "exec(something)", where something is the parameter. The attacker then uses buffer overflow to change the program's parameters, and then uses another buffer overflow to make the program pointer point to a specific code segment in libc. 3. Protection methods for buffer overflow There are currently four basic methods to protect the buffer from attacks and effects of buffer overflow. In 3.1, the method of forcing the correct code is introduced.
In 3.2, the operating system makes the buffer unexecutable, thereby preventing attackers from fertilizing attack code. This method effectively prevents many buffer overflow attacks, but attackers do not necessarily have to breed attack code to implement buffer overflow attacks (see Section 2.1), so this method still has weaknesses. In 3.3, we introduce the use of compiler boundary checking to achieve buffer protection. This method makes buffer overflow impossible, thus completely eliminating the threat of buffer overflow, but it is relatively expensive. In 3.4, we introduce an indirect method that performs integrity checks before the program pointer fails. Although this method cannot invalidate all buffer overflows, it does prevent most buffer overflow attacks, and it is difficult to achieve buffer overflows protected by this method. Then in 3.5, we want to analyze the compatibility and performance advantages of this protection method (with array boundary check). 3.1 Writing the right code Writing the right code is a very meaningful but time-consuming task, especially like writing programs that are prone to errors in C language (such as zero-ends of strings). This style is caused by the tradition of pursuing performance and ignoring correctness. Although it took a long time to let people know how to write secure programs, programs with security vulnerabilities still appear. Therefore, people have developed some tools and technologies to help inexperienced programmers write safe and correct programs. The easiest way is to use grep to search for calls to libraries that are prone to vulnerabilities in the source code, such as calls to strcpy and sprintf. Neither of these functions check the length of the input parameters. In fact, there are such problems in the standard library of each version of C. In order to find some common vulnerabilities such as buffer overflow and operating system race conditions, the code inspection team checked a lot of code. However, there are still fish that miss the net. Although alternative functions such as strncpy and snprintf are used to prevent buffer overflow, this situation still occurs due to the problem of writing code. For example, the lprm program is the best example. Although it passes the code security check, there is still a problem of buffer overflow. In order to deal with these problems, people have developed some advanced error detection tools, such as fault injection, etc. The purpose of these tools is to find security vulnerabilities in code by artificially randomly generating some buffer overflows. There are also some static analysis tools used to detect the existence of buffer overflow. Although these tools help programmers develop safer programs, due to the characteristics of C language, these tools cannot find all buffer overflow vulnerabilities. Therefore, error detection technology can only be used to reduce the possibility of buffer overflow and cannot completely eliminate its existence. Unless the programmer can ensure that his program is foolproof, the following contents of parts 3.2 to 3.4 must be used to ensure the reliable performance of the program.
3.2 Non-executed buffers By making the data segment address space of the attacked program unexecutable, it makes it impossible for an attacker to execute code that is colonized into the input buffer of the attacked program. This technology is called a non-executed buffer technology. In fact, many old Unix systems are designed like this, but recently Unix and MS Windows systems often dynamically put executable code in data segments due to better performance and functionality. Therefore, in order to maintain the compatibility of the program, it is impossible to make all program data segments unexecutable. However, we can set the stack data segments to be unexecutable, so that the compatibility of the program can be ensured to the maximum extent. Both Linux and Solaris have released kernel patches on this aspect. Because almost no legal program will store code in the stack, this practice has almost no compatibility issues, except for two special cases in Linux, the executable code must be put into the stack: Signal pass: Linux implements the sending of Unix signals to the process by releasing code to the process stack and then raising an interrupt to execute the code in the stack. The patches of non-executive buffers allow the buffer to be executable when sending signals. Online reuse of GCC: Research has found that gcc has placed executable code in the stack area for online reuse. However, turning off this feature does not cause any problem, only some of it seems to be unavailable. The protection of the non-execution stack can effectively deal with buffer overflow attacks that breed code into automatic variables, but has no effect on other forms of attacks (see 2.1). This protection can be skipped by referencing a pointer to a resident program. Other attacks can use the code to be copied into the heap or static data segment to skip protection.
3.3 Array boundary checking: Buffer overflow caused by incorporating code is one aspect, and disrupting the execution process of the program is another aspect. Unlike non-executing buffer protection, array boundary checking completely places buffer overflow generation and attacks. In this way, as long as the array cannot be overflowed, there is no way to talk about overflow attacks. In order to implement array bounds checking, all read and write operations to the array should be checked to ensure that the operations on the array are within the correct range. The most direct way is to check all array operations, but some optimization techniques can usually be used to reduce the number of checks. Currently there are several check methods: 3.3.1 Compaq C Compiler Compaq Company's C compiler developed by Compaq for the Alpha CPU (cc on Tru64's Unix platform and ccc on Alpha Linux platform) supports limited bounds checking (using the -check_bounds parameter). These limitations are: · Only displayed array references are checked, such as "a[3]" will be checked, while "*(a+3)" will not. · Since all C arrays are passed by pointers when transmitted, the arrays passed to the function will not be checked. · Dangerous library functions such as strcpy will not perform boundary checks during compilation, even if boundary checks are specified. Since it is so frequent in C language that using pointers for array operations and passing through, this limitation is very serious. Usually this kind of bounds check is used for program error checks, and it cannot guarantee that there will be no buffer overflow vulnerabilities.