1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
|
Technical Brief 0001: Position-Independent Code (PIC) in 32-bit x86 Assembly for PIE Kernels
============================================================================================
The design of a modern x86-64 kernel, compiled as a Position-Independent Executable (PIE), necessitates a 32-bit assembly bootstrap stage for initial hardware setup.
This architectural requirement, however, introduces significant challenges during the linking phase.
A linker error may manifest during this process, presenting the following diagnostic:
.. code-block:: text
relocation R_X86_64_32 against symbol `...' can not be used when making a PIE object; recompile with -fPIE
This error arises despite the explicit use of the ``-fPIE`` compilation flag for the object file in question.
Its occurrence indicates a fundamental incompatibility between the linking model of a PIE and the machine code generated from conventional 32-bit assembly instructions that reference symbolic addresses.
This scenario reveals a critical distinction between compiler-generated position independence and the manual implementation required for hand-written assembly in a mixed-mode, relocatable binary.
Root Cause Analysis
-------------------
The cause of this issue is a conflict between the linking model mandated by a Position-Independent Executable and the addressing capabilities inherent to the 32-bit x86 instruction set architecture (ISA).
- **Position-Independent Executable (PIE) Constraints:**
A PIE is a variant of the Executable and Linkable Format (ELF) [#1]_ designed to be loaded at an arbitrary virtual address and function correctly without modification.
A strict prerequisite for this functionality is the complete absence of absolute virtual addresses within the binary's code and data sections.
Consequently, all internal data and function references must be encoded relative to the instruction pointer.
In the x86-64 ISA, this is typically accomplished through the native ``IP``-relative addressing mode (e.g., ``mov symbol(%rip), %rax``), which generates relocations of type ``R_X86_64_PC32``.
These PC-relative relocations are resolved by the linker based on the distance between the instruction and the symbol, a value that is constant regardless of the final load address.
- **32-bit Addressing Limitations:**
The 32-bit x86 ISA lacks a native mechanism for instruction-pointer-relative addressing.
When an assembly instruction references a symbol by its name (e.g., ``movl $symbol, %eax``), the assembler's default behavior is to generate a relocation entry of type ``R_X86_64_32``.
This entry serves as a directive for the linker to substitute the symbol's final, 32-bit absolute virtual address into the machine code during the linking phase.
This process fundamentally embeds a hardcoded address into the instruction, making the code position-dependent.
- **Mismatch:**
During the final link stage, the linker encounters these requests for absolute addresses within the 32-bit object code.
However, the linker's output target is a PIE, a format that explicitly forbids such absolute relocations because they would violate its defining characteristic of being relocatable.
The ``-fPIE`` flag, being a directive for a *compiler*, influences the code generation strategy for high-level languages like C++ but has no semantic effect on hand-written assembly that utilizes instructions which inherently produce absolute address relocations.
The linker, therefore, correctly identifies this violation of the PIE contract and terminates with an error.
Solution: Runtime Address Calculation
-------------------------------------
Resolution of this conflict necessitates the manual implementation of position-independent code within the 32-bit assembly module.
The core principle of this technique is the elimination of all instructions that would otherwise generate absolute address relocations.
Instead, the absolute address of any required symbol must be calculated at runtime relative to the current instruction pointer.
- **The ``call``/``pop`` Idiom:**
The canonical technique for obtaining the value of the 32-bit instruction pointer (``EIP``) involves a ``call`` to the immediately subsequent instruction.
The ``call`` instruction pushes its return address—which is the address of the next instruction—onto the stack.
A ``pop`` instruction can then retrieve this value into a general-purpose register.
.. code-block:: gas
call .Lget_eip
.Lget_eip:
popl %ebx
Upon completion of this sequence, the ``%ebx`` register contains the absolute virtual address of the ``.Lget_eip`` label at runtime.
This address serves as a reliable anchor from which other symbols' addresses can be calculated.
- **Establishing a Base Register:**
By convention, specifically within the i386 System V ABI, the ``%ebx`` register is designated for this purpose.
It is classified as a "callee-saved" register, which obligates any conforming function to preserve its value across calls.
By establishing ``%ebx`` as a base register at the commencement of the bootstrap sequence, its value can be reliably utilized for all subsequent address calculations within that scope, even after calling external C or C++ functions.
Using a "caller-saved" register like ``%eax`` would be incorrect, as its value would have to be considered invalid after every function call.
Representative Implementations
------------------------------
The subsequent examples provide canonical implementations for converting common position-dependent assembly instructions into their PIE-compliant equivalents.
These examples assume that a base register, ``%ebx``, has been initialized with the current location counter via the ``call``/``pop`` idiom at a label which, for the purpose of these examples, is designated ``.Lbase``.
Accessing a Symbol's Address
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This pattern is applicable when passing a pointer to a symbol as a function argument.
- Problematic Code:
.. code-block:: gas
pushl $message_prefix_panic
- PIE-Compatible Solution:
.. code-block:: gas
// Calculate the address: base_address + (symbol_address - base_address).
// The term (message_prefix_panic - .Lbase) is a link-time constant offset.
leal (message_prefix_panic - .Lbase)(%ebx), %eax
pushl %eax
Accessing a Symbol's Content
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This pattern is employed when reading from or writing to a global variable.
- Problematic Code:
.. code-block:: gas
movl (vga_buffer_pointer), %esi
- PIE-Compatible Solution:
.. code-block:: gas
// First, calculate the address of the pointer variable into a register.
leal (vga_buffer_pointer - .Lbase)(%ebx), %edi
// Then, dereference the pointer via the register to access its content.
movl (%edi), %esi
Complex Addressing Modes
~~~~~~~~~~~~~~~~~~~~~~~~
This pattern is frequently used for array access.
- Problematic Code:
.. code-block:: gas
movl %eax, page_map_level_2(,%ecx,8)
- PIE-Compatible Solution:
.. code-block:: gas
// Calculate the base address of the array into a register.
leal (page_map_level_2 - .Lbase)(%ebx), %edx
// Utilize the register as the base in the complex addressing mode.
movl %eax, (%edx, %ecx, 8)
Far Jumps
~~~~~~~~~
This technique is required for critical operations such as loading a new Global Descriptor Table (GDT) and transitioning to 64-bit mode.
- Problematic Code:
.. code-block:: gas
jmp $global_descriptor_table_code, $_transition_to_long_mode
- PIE-Compatible Solution (using ``lret``):
.. code-block:: gas
// Calculate the absolute virtual address of the 64-bit entry point.
leal (_transition_to_long_mode - .Lbase)(%ebx), %eax
// Push the new segment selector and the calculated address onto the stack.
pushl $global_descriptor_table_code
pushl %eax
// lret performs a far return, using the values from the stack,
// thereby achieving an indirect, position-independent far jump.
lret
.. rubric:: References
.. [#1] M. Matz, J. Hubička, A. Jaeger, and M. Mitchell, “System V Application Binary Interface AMD64 Architecture Processor Supplement Draft Version,” 2012. Available: https://refspecs.linuxfoundation.org/elf/x86_64-abi-0.99.pdf
|