#CS252#Computer-Science

What Is a Program?

  • A program is a file in a special format that contains all the necessary information to load an application into memory and make it run
  • A program file includes:
    • Machine instructions
    • Initialized data
    • List of library dependencies
    • List of memory sections that the program will use
    • List of undefined values in the executable that will be not be known until the program is loaded into memory
  • There are multiple executable file formats
    • ELF - Executable Link File
      • This is used in most UNIX systems
    • COFF - Common Object File Format
      • Used in Windows
    • a.out - Used in BSD (Berkeley Standard Distribution) and early UNIX
      • Very restrictive and not in use anymore really
    • Side note: BSD UNIX and AT&T UNIX are the predecessors of the modern UNIX systems

Memory of a Program

  • Programs see memory as an array of bytes from 0 to
    • (0x0000000000000000-0xFFFFFFFFFFFFFFFF), this is assuming a 64 bit architecture
  • The memory is organized into different sections, called “memory mappings”
  • Each section has different permissions, read/write/execute or a combination of them
    • Text - This is where the actual code is stored, the instructions that the program runs
    • Data - These are the initialized global variables
    • BSS - These are the uninitialized global variables. They are all set to 0 to start with
    • Heap - This is where memory is returned from when calling malloc or new. It grows upwards
    • Stack - This is where the local variables and return addresses are stored. It grows downwards
    • Dynamic Libraries - These are the libraries shared with other processes
      • Each dynamic library has its own text, data, and BSS
  • Each program has its own view of memory, they don’t all see the same section of memory
    • This is called the “address space” of the program
    • If a program modifies its address space, it won’t affect another one
// Program hello.c
int a = 5; // Stored in data section 
int b[20]; // Stored in bss 
int main() { // Stored in text 
	int x; // Stored in stack 
	int *p =(int*) malloc(sizeof(int)); //In heap 
}
  • Between each memory section there may be gaps without a memory mapping
  • If a program attempts to access one of these gaps, the OS will send a SEGV signal that (by default) kills the program and dumps a core file
    • The core file contains the values of the variables, both global and local, at the time of the SEGV signal
    • The core file can be used for debugging “post mortem”, after the program has died
      • gdb program-name core; gdb> where;

Building a Program

  • The programmer writes a program hello.c
  • The preprocessor expands the#define,#include,#ifdef etc. preprocessor statements and generates a hello.i file
  • The compiler compiles hello.i, optimizes it, and generates assembly instructions in hello.s
  • The assembler assembles hello.s
  • The linker puts together all these object files as well as the object files in any static libraries
    • The linker will also take definitions in shared libraries and verify that the functions and variables needed by the program are there
    • If there is a symbol that is not defined in either the executable or shared libraries, the linker will throw an error
    • Static libraries (.a files) are added to the executable, whereas shared libraries (.so files) are not added
  • The compiler by default hides all these intermediate steps and just spits out the final executable file
  • The loader is a program that is used to run an executable file in a process
    • Before the program runs, the loader allocates space for all the sections of the exe file (text, data, BSS, etc.)
    • It loads into memory the executable and shared libraries (if they aren’t already in memory)
    • It also resolves any values in the executable that point to things in shared libraries, for example calls to printf
    • Once the memory is ready, the loader goes to the _start entry point that initializes all the libraries and then calls main()
    • The loader is also called the “runtime linker”