Investieren in binary ozioni25 comments
Please help out by buying a subscription and keeping LWN on the net. The previous article in this series described the general mechanisms that the Linux kernel has for executing programs as a result of a user-space call to execve. That recursion almost always ends with the invocation of an ELF binary program, which is the subject of this article.
An ELF file for an executable program rather than a shared library or an object file must always contain a program header table near the start of the file, after the ELF header ; each entry in this table provides information that is needed to run the program.
The kernel only really cares about three types of program header entries. This includes code and data sections that come from the executable file, together with the size of a BSS section. The BSS will be filled with zeroes thus only its length needs to be stored in the executable file. This article only focuses on what's needed to load an ELF program, rather than exploring all of the details of the format.
The interested reader can find much more information via the references linked from Wikipedia's ELF article or by exploring real binaries with the objdump tool. With this preparation done, the code needs to initialize those attributes of the new program that are not inherited from the old program; the Single UNIX Specification version 3 SUSv3 exec specification describes most of the required behavior and table of The Linux Programming Interface gives an excellent summary of the attributes involved.
Any other threads of the old program are killed so the new program starts with a single thread, and the signal-handling information for the process is unshared so that it can be safely altered later.
This function starts by determining whether the new program can generate a core dump or have ptrace attach to it ; this is disabled by default for setuid or setgid programs. Dumping is also disabled when the program file isn't readable under the current credentials. The virtual memory for the new program also needs to be set up. To improve security by helping protect against stack overflow attacks , the highest address for the stack is typically moved downward by a random offset.
It then sets up zero-filled pages that correspond to the program's BSS segment. All of the preparation has now been done, and the new program can be launched.
The zero return code from the handler indicates success, and the execve syscall returns to user space — but to a completely different user space, where the process's memory has been remapped, and the restored registers have values that start the execution of the new program. The first collection of information forms the ELF auxiliary vector, a collection of id, value pairs that describe useful information about the program being run and the environment it is running in, communicated from the kernel to user space.
An LWN article from Michael Kerrisk describes the contents of this vector, so here we just mention a few interesting entries:. Once this auxiliary vector is created, the code now assembles the rest of the new program's stack. The required space is calculated , and then the entries are inserted from low addresses to higher ones:.
Taken together, the top of the new program's address space will have contents like the following example this page has a similar example: The SP register tells the program where the top of the stack is i. The values found within all of this information give the addresses of the argument strings, environment strings, and auxiliary data values, so no explicit information about the size of the random gap is needed.
However, most programs are dynamically linked, meaning that required shared libraries have to be located and linked at run-time. This process is similar to the process of loading the original program: The execution start address for the program is also set to be the entry point of the interpreter, rather than that of the program itself.
When the execve system call completes, execution then begins with the ELF interpreter, which takes care of satisfying the linkage requirements of the program from user space — finding and loading the shared libraries that the program depends on, and resolving the program's undefined symbols to the correct definitions in those libraries. So how does the kernel support these binaries?
This file didn't appear in our earlier list of places that register binary handlers, because the file contains almost no code of its own. Other than these changes, the format handler therefore behaves the same as the normal ELF handler described above. One set of changes uses bit versions of the structures describing the layout of the ELF file; similarly, the appropriate constant values for bit binaries are used, which ensures that the compatibility handler only claims support for the relevant ELF binary types.
The preprocessor changes also redirect some of the inner functionality of the ELF handler code. Every program that runs on a Linux system passes through the portal of execve ; as such it's a key piece of kernel functionality that's worth understanding in detail. Although the kernel natively supports script and other machine-code format programs, program execution on a modern Linux system eventually involves running an ELF binary. ELF is a complicated format, but fortunately the kernel can ignore most of that complexity — it only needs to understand just enough ELF to load segments into memory, and to invoke a user space run-time linker program to finish the job of assembling a complete running program.
How programs get run: Posted Feb 5, Posted Feb 6, 2: That one contains a nice discussion of checks that aren't made when launching an executable Posted Feb 6, Posted Feb 7, Posted Feb 16, I started work at Librascope Sept '59, and those were the computers I cut my teeth on.
It's been interesting -- and still is ;- How programs get run: Posted Feb 7, 4: Then I learned machine language for it, and delighted in knowing that 11 was add, 12 was subtract, while 21 was add immediate and 22 was subtract immediate just reaching; those may not be correct! My teacher and I began a contest to see who could get the most interesting program on a single 80 column card.
I think he gave up when I got digits of instruction with various nefarious overlaps. One sense switch would bypass a delay loop; a second halted the program. I think I learned for more about useful programming in that summer than any class since.
Posted Feb 8, Posted Feb 12, I have an instinctive reaction that this sort of behaviour should have to be explicitly enabled via sysctl or something - it seems to violate the principle of least astonishment in a way that could have surprising implications, including security ones.
Am I way off base here? Certainly I am working from a position of abject ignorance. February 4, This article was contributed by David Drysdale. ELF binaries Posted Feb 5, ELF binaries Posted Feb 6, 2: ELF binaries Posted Feb 6, ELF binaries Posted Feb 7, LGP Posted Feb 16, It's been interesting -- and still is ;-.
ELF binaries Posted Feb 7, 4: ELF binaries Posted Feb 8, ELF binaries Posted Feb 12, ELF binaries Posted Feb 16,