The Portable Executable (PE) format
The first thing to start with is the PE format. Knowledge and understanding of this format is a prerequisite for developing antivirus engines for the Windows platform (historically, the vast majority of viruses in the world are targeted at Windows).
The Portable Executable (PE) format is a file format used by the Windows operating system to store executable files, such as .EXE and .DLL files. It was introduced with the release of Windows NT in 1993, and has since become the standard format for executables on Windows systems.
Before the introduction of the PE format, Windows used a variety of different formats for executable files, including the New Executable (NE) format for 16-bit programs and the Compact Executable (CE) format for 32-bit programs. These formats had their own unique set of rules and conventions, which made it difficult for the operating system to reliably load and run programs.
In order to standardize the layout and structure of executable files, Microsoft introduced the PE format with the release of Windows NT. The PE format was designed to be a common format for both 32-bit and 64-bit programs.
One of the key features of the PE format is its use of a standardized header, which is located at the beginning of the file and contains a number of fields that provide the operating system with important information about the executable file. This header includes the IMAGE_DOS_HEADER and the IMAGE_NT_HEADER structures, which is divided into two main sections: the IMAGE_FILE_HEADER and the IMAGE_OPTIONAL_HEADER.
Most of the headers of PE-format are declared in the header file WinNT.h
IMAGE_DOS_HEADER
The IMAGE_DOS_HEADER structure is a legacy header that is used to support backward compatibility with MS-DOS. It is used to store information about the file that is required by MS-DOS, such as the location of the program's code and data in the file, and the program's entry point. This allowed programs that were written for MS-DOS to be run on Windows NT, provided that they were compiled as PE files.
typedef struct _IMAGE_DOS_HEADER
{
WORD e_magic;
WORD e_cblp;
WORD e_cp;
WORD e_crlc;
WORD e_cparhdr;
WORD e_minalloc;
WORD e_maxalloc;
WORD e_ss;
WORD e_sp;
WORD e_csum;
WORD e_ip;
WORD e_cs;
WORD e_lfarlc;
WORD e_ovno;
WORD e_res[4];
WORD e_oemid;
WORD e_oeminfo;
WORD e_res2[10];
DWORD e_lfanew; // offset of IMAGE_NT_HEADER
} IMAGE_DOS_HEADER, *PIMAGE_DOS_HEADER;
There are next interestings fields for us:
-
e_magic field is used to identify the file as a valid PE file. As you can see, the e_magic field is a 16-bit unsigned integer that specifies the "magic number" of the file. The magic number is a special value that identifies the file as a valid PE file. It is set to the value 0x5A4D (hexadecimal), which is the ASCII representation of the characters "MZ" (IMAGE_DOS_SIGNATURE).
-
e_lfanew field is used to specify the location of the IMAGE_NT_HEADERS structure, which contains information about the layout and characteristics of the PE file. As you can see, the e_lfanew field is a 32-bit signed integer that specifies the location of the IMAGE_NT_HEADERS structure in the file. It is typically set to the offset of the structure relative to the beginning of the file.
History
In the early 1980s, Microsoft was working on a new operating system called MS-DOS, which was designed to be a simple, lightweight operating system for personal computers. One of the key features of MS-DOS was its ability to run executables, which are programs that can be run on a computer.
To make it easy to identify executables, the developers of MS-DOS decided to use a special "magic number" at the beginning of every executable file. This magic number would be used to distinguish executables from other types of files, such as data files or configuration files.
Mark Zbikowski, who was a developer on the MS-DOS team, came up with the idea of using the characters "MZ" as the magic number. In ASCII code, the letter "M" is represented by the hexadecimal value 0x4D, and the letter "Z" is represented by the hexadecimal value 0x5A. When these values are combined, they form the magic number 0x5A4D, which is the ASCII representation of the characters "MZ".
Today, the "MZ" signature is still used to identify PE files, which are the main executable file format used on the Windows operating system. It is stored in the e_magic field of the IMAGE_DOS_HEADER structure, which is the first structure in a PE file.
IMAGE_NT_HEADER
The IMAGE_NT_HEADER is a data structure that was introduced with the Windows NT operating system, which was released in 1993. It was designed to provide the operating system with a standard way of reading and interpreting the contents of executable files (PE files).
With the release of Windows NT, Microsoft introduced the IMAGE_NT_HEADER as a way to standardize the layout and structure of executable files. This made it easier for the operating system to load and run programs, as it only had to support a single format.
https://learn.microsoft.com/en-us/windows/win32/api/winnt/ns-winnt-image_nt_headers32
https://learn.microsoft.com/en-us/windows/win32/api/winnt/ns-winnt-image_nt_headers64
typedef struct _IMAGE_NT_HEADERS32
{
DWORD Signature;
IMAGE_FILE_HEADER FileHeader;
IMAGE_OPTIONAL_HEADER32 OptionalHeader;
} IMAGE_NT_HEADERS32, *PIMAGE_NT_HEADERS32;
typedef struct _IMAGE_NT_HEADERS64
{
DWORD Signature;
IMAGE_FILE_HEADER FileHeader;
IMAGE_OPTIONAL_HEADER64 OptionalHeader;
} IMAGE_NT_HEADERS64, *PIMAGE_NT_HEADERS64;
The IMAGE_NT_HEADER is a structure that appears at the beginning of every portable executable (PE) file in the Windows operating system. It contains a number of fields that provide the operating system with important information about the executable file, such as its size, layout, and intended purpose.
The IMAGE_NT_HEADER structure is divided into two main sections: the IMAGE_FILE_HEADER and the IMAGE_OPTIONAL_HEADER.
IMAGE_FILE_HEADER
The IMAGE_FILE_HEADER contains information about the executable file as a whole, including its machine type (e.g. x86, x64), the number of sections in the file, and the date and time the file was created.
https://learn.microsoft.com/en-us/windows/win32/api/winnt/ns-winnt-image_file_header
typedef struct _IMAGE_FILE_HEADER
{
WORD Machine;
WORD NumberOfSections;
DWORD TimeDateStamp;
DWORD PointerToSymbolTable;
DWORD NumberOfSymbols;
WORD SizeOfOptionalHeader;
WORD Characteristics;
} IMAGE_FILE_HEADER, *PIMAGE_FILE_HEADER;
The structure has the following fields:
-
Machine: This field specifies the target architecture for which the file was built. The value of this field is determined by the compiler when the file is built. Some common values are:
-
IMAGE_FILE_MACHINE_I386: The file is intended to run on x86 architecture, also known as 32-bit.
-
IMAGE_FILE_MACHINE_AMD64: The file is intended to run on x64 architecture, also known as 64-bit.
-
IMAGE_FILE_MACHINE_ARM: The file is intended to run on ARM architecture.
-
-
NumberOfSections: This field specifies the number of sections in the PE file. A PE file is divided into several sections, each of which contains different types of information such as code, data, and resources. This field is used by the operating system to determine how many sections are present in the file.
-
TimeDateStamp: This field contains the timestamp of when the file was built. The timestamp is stored as a 4-byte value representing the number of seconds since January 1, 1970, 00:00:00 UTC. This field can be used to determine when the file was last built, which can be useful for debugging or version management.
-
PointerToSymbolTable: This field specifies the file offset of the COFF (Common Object File Format) symbol table, if present. The COFF symbol table contains information about symbols used in the file, such as function names, variable names, and line numbers. This field is only used for debugging purposes and is typically not present in release builds.
-
NumberOfSymbols: This field specifies the number of symbols in the COFF symbol table, if present. This field is used in conjunction with PointerToSymbolTable to locate the COFF symbol table in the file.
-
SizeOfOptionalHeader: This field specifies the size of the optional header, which contains additional information about the file. The optional header typically includes information about the entry point of the file, the imported libraries, and the size of the stack and heap.
-
Characteristics: This field specifies various attributes of the file. Some common values are:
-
IMAGE_FILE_EXECUTABLE_IMAGE: The file is an executable file.
-
IMAGE_FILE_DLL: The file is a dynamic-link library (DLL).
-
IMAGE_FILE_32BIT_MACHINE: The file is a 32-bit file.
-
IMAGE_FILE_DEBUG_STRIPPED: The file has been stripped of debug information.
-
These fields provide important information about the file that is used by the operating system when loading the file into memory and executing it. By understanding the fields in the IMAGE_FILE_HEADER structure, you can gain a deeper understanding of how PE files are structured and how the operating system uses them.
Most of the possible values for each field are declared in the header file WinNT.h
IMAGE_OPTIONAL_HEADER
The IMAGE_FILE_HEADER structure is followed by the optional header, which is described by the IMAGE_OPTIONAL_HEADER structure. The optional header contains additional information about the image, such as the address of the entry point, the size of the image, and the address of the import directory.
https://learn.microsoft.com/en-us/windows/win32/api/winnt/ns-winnt-image_optional_header32
https://learn.microsoft.com/en-us/windows/win32/api/winnt/ns-winnt-image_optional_header64
typedef struct _IMAGE_OPTIONAL_HEADER32
{
WORD Magic;
BYTE MajorLinkerVersion;
BYTE MinorLinkerVersion;
DWORD SizeOfCode;
DWORD SizeOfInitializedData;
DWORD SizeOfUninitializedData;
DWORD AddressOfEntryPoint;
DWORD BaseOfCode;
DWORD BaseOfData;
DWORD ImageBase;
DWORD SectionAlignment;
DWORD FileAlignment;
WORD MajorOperatingSystemVersion;
WORD MinorOperatingSystemVersion;
WORD MajorImageVersion;
WORD MinorImageVersion;
WORD MajorSubsystemVersion;
WORD MinorSubsystemVersion;
DWORD Win32VersionValue;
DWORD SizeOfImage;
DWORD SizeOfHeaders;
DWORD CheckSum;
WORD Subsystem;
WORD DllCharacteristics;
DWORD SizeOfStackReserve;
DWORD SizeOfStackCommit;
DWORD SizeOfHeapReserve;
DWORD SizeOfHeapCommit;
DWORD LoaderFlags;
DWORD NumberOfRvaAndSizes;
IMAGE_DATA_DIRECTORY DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES];
} IMAGE_OPTIONAL_HEADER32, *PIMAGE_OPTIONAL_HEADER32;
typedef struct _IMAGE_OPTIONAL_HEADER64
{
WORD Magic;
BYTE MajorLinkerVersion;
BYTE MinorLinkerVersion;
DWORD SizeOfCode;
DWORD SizeOfInitializedData;
DWORD SizeOfUninitializedData;
DWORD AddressOfEntryPoint;
DWORD BaseOfCode;
ULONGLONG ImageBase;
DWORD SectionAlignment;
DWORD FileAlignment;
WORD MajorOperatingSystemVersion;
WORD MinorOperatingSystemVersion;
WORD MajorImageVersion;
WORD MinorImageVersion;
WORD MajorSubsystemVersion;
WORD MinorSubsystemVersion;
DWORD Win32VersionValue;
DWORD SizeOfImage;
DWORD SizeOfHeaders;
DWORD CheckSum;
WORD Subsystem;
WORD DllCharacteristics;
ULONGLONG SizeOfStackReserve;
ULONGLONG SizeOfStackCommit;
ULONGLONG SizeOfHeapReserve;
ULONGLONG SizeOfHeapCommit;
DWORD LoaderFlags;
DWORD NumberOfRvaAndSizes;
IMAGE_DATA_DIRECTORY DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES];
} IMAGE_OPTIONAL_HEADER64, *PIMAGE_OPTIONAL_HEADER64;
Here's a detailed description of each field in the IMAGE_OPTIONAL_HEADER structure:
-
Magic: This field specifies the type of optional header that is present in the PE file. The most common value is IMAGE_NT_OPTIONAL_HDR32_MAGIC for a 32-bit file or IMAGE_NT_OPTIONAL_HDR64_MAGIC for a 64-bit file.
-
MajorLinkerVersion and MinorLinkerVersion: These fields specify the version of the linker that was used to build the file. The linker is a tool that is used to combine object files and libraries into a single executable file.
-
SizeOfCode: This field specifies the size of the code section in the file. The code section contains the machine code for the executable file.
-
SizeOfInitializedData: This field specifies the size of the initialized data section in the file. The initialized data section contains data that is initialized at runtime, such as global variables.
-
SizeOfUninitializedData: This field specifies the size of the uninitialized data section in the file. The uninitialized data section contains data that is not initialized at runtime, such as the bss section.
-
AddressOfEntryPoint: This field specifies the virtual address of the entry point of the file. The entry point is the starting address of the program and is the first instruction that is executed when the file is loaded into memory.
-
BaseOfCode: This field specifies the virtual address of the beginning of the code section.
-
ImageBase: This field specifies the preferred virtual address at which the file should be loaded into memory. This address is used as a base address for all virtual addresses within the file.
-
SectionAlignment: This field specifies the alignment of sections within the file. The sections in the file are typically aligned on multiples of this value to improve performance.
-
FileAlignment: This field specifies the alignment of sections within the file on disk. The sections in the file are typically aligned on multiples of this value to improve disk performance.
-
MajorOperatingSystemVersion and MinorOperatingSystemVersion: These fields specify the minimum required version of the operating system that is needed to run the file.
-
MajorImageVersion and MinorImageVersion: These fields specify the version of the image. The image version is used to identify the version of the file for version management purposes.
-
MajorSubsystemVersion and MinorSubsystemVersion: These fields specify the version of the subsystem that is required to run the file. The subsystem is the environment in which the file runs, such as the Windows Console or Windows GUI.
-
Win32VersionValue: This field is reserved and typically set to 0.
-
SizeOfImage: This field specifies the size of the image, in bytes, when loaded into memory.
-
SizeOfHeaders: This field specifies the size of the headers, in bytes. The headers include the IMAGE_FILE_HEADER and IMAGE_OPTIONAL_HEADER.
-
CheckSum: This field is used to check the integrity of the file. The check sum is calculated by summing the contents of the file and storing the result in this field. The check sum is used to detect changes to the file that may occur due to tampering or corruption.
-
Subsystem: This field specifies the subsystem that is required to run the file. The possible values include IMAGE_SUBSYSTEM_NATIVE, IMAGE_SUBSYSTEM_WINDOWS_GUI, IMAGE_SUBSYSTEM_WINDOWS_CUI, IMAGE_SUBSYSTEM_OS2_CUI, etc.
-
DllCharacteristics: This field specifies characteristics of the file, such as whether it is a dynamic-link library (DLL) or whether it can be relocated at load time. The possible values include IMAGE_DLLCHARACTERISTICS_DYNAMIC_BASE, IMAGE_DLLCHARACTERISTICS_NX_COMPAT, etc.
-
SizeOfStackReserve: This field specifies the size of the stack, in bytes, that is reserved for the program. The stack is used for storing temporary data, such as function call information.
-
SizeOfStackCommit: This field specifies the size of the stack, in bytes, that is committed for the program. The committed stack is the portion of the stack that is actually reserved in memory.
-
SizeOfHeapReserve: This field specifies the size of the heap, in bytes, that is reserved for the program. The heap is used for allocating memory dynamically at runtime.
-
SizeOfHeapCommit: This field specifies the size of the heap, in bytes, that is committed for the program. The committed heap is the portion of the heap that is actually reserved in memory.
-
LoaderFlags: This field is reserved and typically set to 0.
-
NumberOfRvaAndSizes: This field specifies the number of data directory entries in the IMAGE_OPTIONAL_HEADER. The data directories contain information about the imports, exports, resources, etc. in the file.
-
DataDirectory: This field is an array of IMAGE_DATA_DIRECTORY structures that specify the location and size of the data directories in the file
IMAGE_SECTION_HEADER
A section, in the context of a PE (Portable Executable) file, is a contiguous block of memory in the file that holds a specific type of data or code. In a PE file, sections are used to organize and store different parts of the file, such as the code, data, resources, etc.
Each section in a PE file has a unique name and is described by a IMAGE_SECTION_HEADER structure, which contains information about the section such as its size, location, characteristics, and so on. The following are the fields of IMAGE_SECTION_HEADER:
An IMAGE_SECTION_HEADER is a data structure used in the Portable Executable (PE) file format, which is used on the Windows operating system to define the layout of a file in memory. The PE file format is used for executable files, DLLs, and other types of files that are loaded into memory by the Windows operating system. Each section header describes a contiguous block of data within the file, and includes information such as the name of the section, the virtual memory address at which the section is to be loaded, and the size of the section. The section headers can be used to locate and access specific parts of the file, such as the code or data sections.
The IMAGE_SECTION_HEADER structure is defined in the Windows Platform SDK, and can be found in the winnt.h header file. Here's an example of how the structure is defined in C++:
#pragma pack(push, 1)
typedef struct _IMAGE_SECTION_HEADER
{
BYTE Name[IMAGE_SIZEOF_SHORT_NAME]; // 8
union {
DWORD PhysicalAddress;
DWORD VirtualSize;
} Misc;
DWORD VirtualAddress;
DWORD SizeOfRawData;
DWORD PointerToRawData;
DWORD PointerToRelocations;
DWORD PointerToLinenumbers;
WORD NumberOfRelocations;
WORD NumberOfLinenumbers;
DWORD Characteristics;
} IMAGE_SECTION_HEADER, *PIMAGE_SECTION_HEADER;
#pragma pack(pop)
As you can see, the structure is defined as a C++ struct and it contains fields for the section's name, virtual size, virtual address, raw data size, and pointer to raw data, relocations, linenumbers, and the number of relocations and linenumbers. Additionally, the Characteristics field contains flags that describe the characteristics of the section, such as whether it is executable, readable, or writable.
-
Name: This 8-byte array is used to specify the name of the section. The name can be any null-terminated string, but it is typically used to give meaningful names to different parts of the file, such as ".text" for executable code, ".data" for initialized data, ".rdata" for read-only data, and ".bss" for uninitialized data. The name of the section is used by the operating system to locate the section within the file, and is also used by debuggers and other tools to identify the section and its contents.
-
VirtualSize: This field specifies the size of the section in memory, in bytes. This value represents the amount of memory that the section will occupy in memory when the file is loaded into memory. The virtual size of the section is used by the operating system to determine the amount of memory that needs to be allocated for the section when the file is loaded into memory.
-
VirtualAddress: This field specifies the starting address of the section in memory, in bytes. This value is the starting address at which the section will be loaded into memory, and is used by the operating system to determine the location in memory where the section will be loaded. The virtual address of the section is also used by the operating system to resolve addresses within the section, so that they can be properly translated into memory addresses when the file is loaded into memory.
-
SizeOfRawData: This field specifies the size of the section in the file, in bytes. This value represents the amount of space in the file that the section will occupy, and is used by the operating system to determine the size of the section in the file. The size of the raw data of a section is used by the operating system to locate the section within the file, and to determine the size of the section when it is loaded into memory.
-
PointerToRawData: This field specifies the offset of the section in the file, in bytes. This value represents the location of the section within the file, and is used to determine where the data for the section can be found. The pointer to the raw data of a section is used by the operating system to locate the section within the file, and to determine the location of the section when it is loaded into memory.
-
PointerToRelocations: This field specifies the offset of the relocation information for the section, in bytes. The relocation information is used to fix up addresses within the section, so that they can be properly resolved when the file is loaded into memory. The pointer to the relocations of a section is used by the operating system to locate the relocation information for the section, and to determine how to fix up the addresses within the section when the file is loaded into memory.
-
PointerToLinenumbers: This field specifies the offset of the line number information for the section, in bytes. The line number information is used for debugging purposes, and provides information about the source code that generated the section. The pointer to the line numbers of a section is used by debuggers and other tools to identify the source code that generated the section, and to provide more detailed information about the contents of the section.
-
NumberOfRelocations: This field specifies the number of relocation entries for the section. A relocation entry is a record that describes how to fix up an address within the section, so that it can be properly resolved when the file is loaded into memory. The number of relocations of a section is used by the operating system to determine the size of the relocation information for the section, and to know how many relocation entries need to be processed when the file is loaded into memory.
-
NumberOfLinenumbers: This field specifies the number of line number entries for the section. A line number entry is a record that provides information about the source code that generated the section, and is used for debugging purposes. The number of line numbers of a section is used by debuggers and other tools to determine the size of the line number information for the section, and to know how many line number entries need to be processed to obtain information about the source code that generated the section.
-
Characteristics: This field is a set of flags that specify the attributes of the section. Some of the common flags used for sections are: IMAGE_SCN_CNT_CODE to indicate that the section contains executable code, IMAGE_SCN_CNT_INITIALIZED_DATA to indicate that the section contains initialized data, IMAGE_SCN_CNT_UNINITIALIZED_DATA to indicate that the section contains uninitialized data, IMAGE_SCN_MEM_EXECUTE to indicate that the section can be executed, IMAGE_SCN_MEM_READ to indicate that the section can be read, and IMAGE_SCN_MEM_WRITE to indicate that the section can be written to. These flags are used by the operating system to determine the properties of the section, and to know how to handle the section when the file is loaded into memory.
These fields are used by the operating system and other programs to manage the memory layout of the file, and to locate and access specific parts of the file, such as the code or data sections.
IMPORTANT: In the context of the IMAGE_NT_HEADER structure, which is used in the Portable Executable (PE) file format, the VirtualAddress and PhysicalAddress fields refer to different things.
The VirtualAddress field is used to specify the virtual address at which the section containing the IMAGE_NT_HEADER structure is loaded into memory at runtime. This address is relative to the base address of the process and is used by the program to access the section's data.
The PhysicalAddress field is used to specify the file offset of the section containing the IMAGE_NT_HEADER structure in the PE file. It is used by the operating system to locate the section's data in the file when it is loaded into memory.
All header fields and offsets for IMAGE_NT_HEADER are defined for memory and operate on virtual addresses.If you need to offset any field on the disk, you need to convert the virtual address to a physical address using the rva2offset function in the code below.
In summary, VirtualAddress is used by the program to access the section in the memory and PhysicalAddress is used by the operating system to locate the section in the file.
IMPORT
When a program is compiled, the compiler generates object files that contain the machine code for the program's functions. However, the object files may not have all the information required for the program to run. For example, the object files may contain calls to functions that are not defined in the program but are instead provided by external libraries.
This is where the import table comes in. The import table lists the external dependencies of the program and the functions that the program needs to import from these dependencies. The dynamic linker uses this information at runtime to resolve the addresses of the imported functions and link them into the program.
For example, consider a program that uses the functions from the Windows operating system. The program may contain calls to the MessageBox function from the user32.dll library, which displays a message box on the screen. To resolve the address of the MessageBox function, the program needs to include an import for user32.dll in its import table.
Similarly, if a program needs to use functions from a third-party library, it needs to include an import for that library in its import table. For example, a program that uses the functions from the OpenSSL library would include an import for the libssl.dll library in its import table.
IMAGE_IMPORT_DIRECTORY
The IMAGE_IMPORT_DIRECTORY is a data structure that is used by the Windows operating system to import functions and data from dynamic-link libraries (DLLs) into a portable executable (PE) file. It is part of the IMAGE_DATA_DIRECTORY, which is a table of data structures that is stored in the IMAGE_OPTIONAL_HEADER of a PE file.
The IMAGE_IMPORT_DIRECTORY is used by the Windows loader to resolve the imported functions and data that are used by the PE file. It does this by mapping the addresses of the imported functions and data to the addresses of the corresponding functions and data in the DLLs. This allows the PE file to use the functions and data from the DLLs as if they were part of the PE file itself.
The IMAGE_IMPORT_DIRECTORY consists of a series of IMAGE_IMPORT_DESCRIPTOR structures, each of which describes a single DLL that is imported by the PE file. Each IMAGE_IMPORT_DESCRIPTOR structure contains the following fields:
-
OriginalFirstThunk: a pointer to a table of imported functions.
-
TimeDateStamp: the date and time the DLL was last updated.
-
ForwarderChain: a chain of forwarded imported functions.
-
Name: the name of the DLL as a null-terminated string.
-
FirstThunk: a pointer to a table of imported functions that are bound to the DLL.
typedef struct _IMAGE_IMPORT_DESCRIPTOR
{
union {
DWORD Characteristics; // 0 for terminating null import descriptor
DWORD OriginalFirstThunk; // RVA to original unbound IAT (PIMAGE_THUNK_DATA)
} DUMMYUNIONNAME;
DWORD TimeDateStamp; // 0 if not bound,
// -1 if bound, and real date ime stamp
// in IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT (new BIND)
// O.W. date/time stamp of DLL bound to (Old BIND)
DWORD ForwarderChain; // -1 if no forwarders
DWORD Name;
DWORD FirstThunk; // RVA to IAT (if bound this IAT has actual addresses)
} IMAGE_IMPORT_DESCRIPTOR;
OriginalFirstThunk table (or FirstThunk if OriginalFirstThunk is 0)
Strings pointed to by table of offsets (OriginalFirstThunk table or FirstThunk if OriginalFirstThunk is 0)
HOW IT WORKS?
The import mechanism implemented by Microsoft is compact and beautiful!
The addresses of all functions from third-party libraries (including windows system ones) that the application uses are stored in a special table - the import table. This table is filled when the module is loaded (about other mechanisms for filling imports we'll talk later).
Further, each time a function is called from a third-party library, the compiler usually generates the following code:
call dword ptr [__cell_with_address_of_function] // for x86 architecture
call qword ptr [__cell_with_address_of_function] // for x64 architecture
Thus, in order to be able to call a function from a library, the system loader only needs to write the address of this function once in one place in the image.
С++ PARSER
And now we will write the simplest parser (compatible with x86 and x64) of the executable file import table!
#include "stdafx.h"
/*
*
* Copyright (C) 2022, SToFU Systems S.L.
* All rights reserved.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License along
* with this program; if not, write to the Free Software Foundation, Inc.,
* 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
*
*/
namespace ntpe
{
static constexpr uint64_t g_kRvaError = -1;
// These types is defined in NTPEParser.h
// typedef std::map< std::string, std::set< std::string >> IMPORT_LIST;
// typedef std::vector< IMAGE_SECTION_HEADER > SECTIONS_LIST;
//**********************************************************************************
// FUNCTION: alignUp(DWORD value, DWORD align)
//
// ARGS:
// DWORD value - value to align.
// DWORD align - alignment.
//
// DESCRIPTION:
// Aligns argument value with the given alignment.
//
// Documentation links:
// Alignment: https://learn.microsoft.com/en-us/cpp/cpp/alignment-cpp-declarations?view=msvc-170
//
// RETURN VALUE:
// DWORD aligned value.
//
//**********************************************************************************
DWORD alignUp(DWORD value, DWORD align)
{
DWORD mod = value % align;
return value + (mod ? (align - mod) : 0);
};
//**********************************************************************************
// FUNCTION: rva2offset(IMAGE_NTPE_DATA& ntpe, DWORD rva)
//
// ARGS:
// IMAGE_NTPE_DATA& ntpe - data from PE file.
// DWORD rva - relative virtual address.
//
// DESCRIPTION:
// Parse RVA (relative virtual address) to offset.
//
// RETURN VALUE:
// int64_t offset.
// g_kRvaError (-1) in case of error.
//
//**********************************************************************************
int64_t rva2offset(IMAGE_NTPE_DATA& ntpe, DWORD rva)
{
/* retrieve first section */
try
{
/* if rva is inside MZ header */
PIMAGE_SECTION_HEADER sec = ntpe.sectionDirectories;
if (!ntpe.fileHeader->NumberOfSections || rva < sec->VirtualAddress)
return rva;
/* walk on sections */
for (uint32_t sectionIndex = 0; sectionIndex < ntpe.fileHeader->NumberOfSections; sectionIndex++, sec++)
{
/* count section end and allign it after each iteration */
DWORD secEnd = ntpe::alignUp(sec->Misc.VirtualSize, ntpe.SecAlign) + sec->VirtualAddress;
if (sec->VirtualAddress <= rva && secEnd > rva)
return rva - sec->VirtualAddress + sec->PointerToRawData;
};
}
catch (std::exception&)
{
}
return g_kRvaError;
};
//**********************************************************************************
// FUNCTION: getNTPEData(char* fileMapBase)
//
// ARGS:
// char* fileMapBase - the starting address of the mapped file.
//
// DESCRIPTION:
// Parses following data from mapped PE file.
//
// Documentation links:
// PE format structure: https://learn.microsoft.com/en-us/windows/win32/debug/pe-format
//
// RETURN VALUE:
// std::optional< IMAGE_NTPE_DATA >.
// std::nullopt in case of error.
//
//**********************************************************************************
#define initNTPE(HeaderType, cellSize) \
{ \
char* ntstdHeader = (char*)fileHeader + sizeof(IMAGE_FILE_HEADER); \
HeaderType* optHeader = (HeaderType*)ntstdHeader; \
data.sectionDirectories = (PIMAGE_SECTION_HEADER)(ntstdHeader + sizeof(HeaderType)); \
data.SecAlign = optHeader->SectionAlignment; \
data.dataDirectories = optHeader->DataDirectory; \
data.CellSize = cellSize; \
}
std::optional< IMAGE_NTPE_DATA > getNTPEData(char* fileMapBase, uint64_t fileSize)
{
try
{
/* PIMAGE_DOS_HEADER from starting address of the mapped view*/
PIMAGE_DOS_HEADER dosHeader = (IMAGE_DOS_HEADER*)fileMapBase;
/* return std::nullopt in case of no IMAGE_DOS_SIGNATUR signature */
if (dosHeader->e_magic != IMAGE_DOS_SIGNATURE)
return std::nullopt;
/* PE signature adress from base address + offset of the PE header relative to the beginning of the file */
PDWORD peSignature = (PDWORD)(fileMapBase + dosHeader->e_lfanew);
if ((char*)peSignature <= fileMapBase || (char*)peSignature - fileMapBase >= fileSize)
return std::nullopt;
/* return std::nullopt in case of no PE signature */
if (*peSignature != IMAGE_NT_SIGNATURE)
return std::nullopt;
/* file header address from PE signature address */
PIMAGE_FILE_HEADER fileHeader = (PIMAGE_FILE_HEADER)(peSignature + 1);
if (fileHeader->Machine != IMAGE_FILE_MACHINE_I386 &&
fileHeader->Machine != IMAGE_FILE_MACHINE_AMD64)
return std::nullopt;
/* result IMAGE_NTPE_DATA structure with info from PE file */
IMAGE_NTPE_DATA data = {};
/* base address and File header address assignment */
data.fileBase = fileMapBase;
data.fileHeader = fileHeader;
/* addresses of PIMAGE_SECTION_HEADER, PIMAGE_DATA_DIRECTORIES, SectionAlignment, CellSize depending on processor architecture */
switch (fileHeader->Machine)
{
case IMAGE_FILE_MACHINE_I386:
initNTPE(IMAGE_OPTIONAL_HEADER32, 4);
return data;
case IMAGE_FILE_MACHINE_AMD64:
initNTPE(IMAGE_OPTIONAL_HEADER64, 8);
return data;
}
}
catch (std::exception&)
{
}
return std::nullopt;
}
//**********************************************************************************
// FUNCTION: getImportList(IMAGE_NTPE_DATA& ntpe)
//
// ARGS:
// IMAGE_NTPE_DATA& ntpe - data from PE file.
//
// DESCRIPTION:
// Retrieves IMPORT_LIST(std::map< std::string, std::set< std::string >>) with all loaded into PE libraries names and imported functions.
// Map key: loaded dll's names.
// Map value: set of imported functions names.
//
// Documentation links:
// Import Directory Table: https://learn.microsoft.com/en-us/windows/win32/debug/pe-format#import-directory-table
//
// RETURN VALUE:
// std::optional< IMPORT_LIST >.
// std::nullopt in case of error.
//
//**********************************************************************************
std::optional< IMPORT_LIST > getImportList(IMAGE_NTPE_DATA& ntpe)
{
try
{
/* if no imaage import directory in file returns std::nullopt */
if (ntpe.dataDirectories[IMAGE_DIRECTORY_ENTRY_IMPORT].VirtualAddress == 0)
return std::nullopt;
IMPORT_LIST result;
/* import table offset */
DWORD impOffset = rva2offset(ntpe, ntpe.dataDirectories[IMAGE_DIRECTORY_ENTRY_IMPORT].VirtualAddress);
/* imoprt table descriptor from import table offset + file base adress */
PIMAGE_IMPORT_DESCRIPTOR impTable = (PIMAGE_IMPORT_DESCRIPTOR)(impOffset + ntpe.fileBase);
/* while names in import table */
while (impTable->Name != 0)
{
/* pointer to DLL name from offset of current section name + file base adress */
std::string modname = rva2offset(ntpe, impTable->Name) + ntpe.fileBase;
std::transform(modname.begin(), modname.end(), modname.begin(), ::toupper);
/* start adress of names in look up table from import table name RVA */
char* cell = ntpe.fileBase + ((impTable->OriginalFirstThunk) ? rva2offset(ntpe, impTable->OriginalFirstThunk) : rva2offset(ntpe, impTable->FirstThunk));
/* while names in look up table */
for (;; cell += ntpe.CellSize)
{
int64_t rva = 0;
/* break if rva = 0 */
memcpy(&rva, cell, ntpe.CellSize);
if (!rva)
break;
/* if rva > 0 function was imported by name. if rva < 0 function was imported by ordinall */
if (rva > 0)
result[modname].emplace(ntpe.fileBase + rva2offset(ntpe, rva) + 2);
else
result[modname].emplace(std::string("#ord: ") + std::to_string(rva & 0xFFFF));
};
impTable++;
};
return result;
}
catch (std::exception&)
{
return std::nullopt;
}
};
//**********************************************************************************
// FUNCTION: getImportList(IMAGE_NTPE_DATA& ntpe)
//
// ARGS:
// std::wstring_view filePath - path to file.
//
// DESCRIPTION:
// Retrieves IMPORT_LIST(std::map< std::string, std::set< std::string >>) with all loaded into PE libraries names and imported functions bu path.
// Map key: loaded dll's names.
// Map value: set of imported functions names.
//
// Documentation links:
// Import Directory Table: https://learn.microsoft.com/en-us/windows/win32/debug/pe-format#import-directory-table
//
// RETURN VALUE:
// std::optional< IMPORT_LIST >.
// std::nullopt in case of error.
//
//**********************************************************************************
std::optional< IMPORT_LIST > getImportList(std::wstring_view filePath)
{
std::vector< char > buffer;
/* obtain base address of mapped file from tools::readFile function */
bool result = tools::readFile(filePath, buffer);
/* return nullopt if readFile failes or obtained buffer is empty */
if (!result || buffer.empty())
return std::nullopt;
/* get IMAGE_NTPE_DATA from base address of mapped file */
std::optional< IMAGE_NTPE_DATA > ntpe = getNTPEData(buffer.data(), buffer.size());
if (!ntpe)
return std::nullopt;
/* return result of overloaded getImportList function with IMAGE_NTPE_DATA as argument */
return getImportList(*ntpe);
}
//**********************************************************************************
// FUNCTION: getSectionsList(IMAGE_NTPE_DATA& ntpe)
//
// ARGS:
// IMAGE_NTPE_DATA& ntpe - data from PE file.
//
// DESCRIPTION:
// Retrieves SECTIONS_LIST from IMAGE_NTPE_DATA.
// SECTIONS_LIST - vector of sections headers from portable executable file.
// Sections names exmaple: .data, .code, .src
//
// Documentation links:
// IMAGE_SECTION_HEADER: https://learn.microsoft.com/en-us/windows/win32/api/winnt/ns-winnt-image_section_header
// Section Table (Section Headers): https://learn.microsoft.com/en-us/windows/win32/debug/pe-format#section-table-section-headers
//
// RETURN VALUE:
// std::optional< SECTIONS_LIST >.
// std::nullopt in case of error.
//
//**********************************************************************************
std::optional< SECTIONS_LIST > getSectionsList(IMAGE_NTPE_DATA& ntpe)
{
try
{
/* result vector of section directories */
SECTIONS_LIST result;
/* iterations through all image section headers poiners in IMAGE_NTPE_DATA structure */
for (uint64_t sectionIndex = 0; sectionIndex < ntpe.fileHeader->NumberOfSections; sectionIndex++)
{
/* pushing IMAGE_SECTION_HEADER from iamge section headers */
result.push_back(ntpe.sectionDirectories[sectionIndex]);
}
return result;
}
catch (std::exception&)
{
}
/* returns nullopt in case of error */
return std::nullopt;
}
}
You can find the code of the entire project on our github:
https://github.com/SToFU-Systems/DSAVE
List of tools used
- PE Tools: https://github.com/petoolse/petools This is an open-source tool for manipulating header PE fields. Supports x86 and x64 files.
- WinDbg: https://learn.microsoft.com/en-us/windows-hardware/drivers/debugger/debugger-download-tools Microsoft's system debugger. Indispensable in the work of a system programmer for Windows OS.
- x64Dbg: https://x64dbg.com Simple, lightweight open-source x64/x86 debugger for windows.
- WinHex: http://www.winhex.com/winhex/hex-editor.html WinHex is a universal hex editor, particularly helpful in the realm of computer forensics, data recovery, low-level data editing.
WHAT IS NEXT?
We appreciate your support and look forward to your continued engagement in our community
In the next article we will write together with you the fuzzy hashing module and touch on the question of black and white lists.simplest import table analyzer.
Any questions of the authors of the article can be sent to the e-mail: articles@stofu.io
Thank you for your attention and have a nice day!