aboutsummaryrefslogtreecommitdiff
path: root/docs/briefs
diff options
context:
space:
mode:
Diffstat (limited to 'docs/briefs')
-rw-r--r--docs/briefs/tb0001-pic-in-32-bit-x86-assembly.rst161
-rw-r--r--docs/briefs/tb0002-x86_64_bootstrap.rst154
2 files changed, 315 insertions, 0 deletions
diff --git a/docs/briefs/tb0001-pic-in-32-bit-x86-assembly.rst b/docs/briefs/tb0001-pic-in-32-bit-x86-assembly.rst
new file mode 100644
index 0000000..503ff43
--- /dev/null
+++ b/docs/briefs/tb0001-pic-in-32-bit-x86-assembly.rst
@@ -0,0 +1,161 @@
+Technical Brief 0001: Position-Independent Code (PIC) in 32-bit x86 Assembly for PIE Kernels
+============================================================================================
+
+The design of a modern x86-64 kernel, compiled as a Position-Independent Executable (PIE), necessitates a 32-bit assembly bootstrap stage for initial hardware setup.
+This architectural requirement, however, introduces significant challenges during the linking phase.
+A linker error may manifest during this process, presenting the following diagnostic:
+
+.. code-block:: text
+
+ relocation R_X86_64_32 against symbol `...' can not be used when making a PIE object; recompile with -fPIE
+
+This error arises despite the explicit use of the ``-fPIE`` compilation flag for the object file in question.
+Its occurrence indicates a fundamental incompatibility between the linking model of a PIE and the machine code generated from conventional 32-bit assembly instructions that reference symbolic addresses.
+This scenario reveals a critical distinction between compiler-generated position independence and the manual implementation required for hand-written assembly in a mixed-mode, relocatable binary.
+
+Root Cause Analysis
+-------------------
+
+The cause of this issue is a conflict between the linking model mandated by a Position-Independent Executable and the addressing capabilities inherent to the 32-bit x86 instruction set architecture (ISA).
+
+- **Position-Independent Executable (PIE) Constraints:**
+ A PIE is a variant of the Executable and Linkable Format (ELF) [#1]_ designed to be loaded at an arbitrary virtual address and function correctly without modification.
+ A strict prerequisite for this functionality is the complete absence of absolute virtual addresses within the binary's code and data sections.
+ Consequently, all internal data and function references must be encoded relative to the instruction pointer.
+ In the x86-64 ISA, this is typically accomplished through the native ``IP``-relative addressing mode (e.g., ``mov symbol(%rip), %rax``), which generates relocations of type ``R_X86_64_PC32``.
+ These PC-relative relocations are resolved by the linker based on the distance between the instruction and the symbol, a value that is constant regardless of the final load address.
+
+- **32-bit Addressing Limitations:**
+ The 32-bit x86 ISA lacks a native mechanism for instruction-pointer-relative addressing.
+ When an assembly instruction references a symbol by its name (e.g., ``movl $symbol, %eax``), the assembler's default behavior is to generate a relocation entry of type ``R_X86_64_32``.
+ This entry serves as a directive for the linker to substitute the symbol's final, 32-bit absolute virtual address into the machine code during the linking phase.
+ This process fundamentally embeds a hardcoded address into the instruction, making the code position-dependent.
+
+- **Mismatch:**
+ During the final link stage, the linker encounters these requests for absolute addresses within the 32-bit object code.
+ However, the linker's output target is a PIE, a format that explicitly forbids such absolute relocations because they would violate its defining characteristic of being relocatable.
+ The ``-fPIE`` flag, being a directive for a *compiler*, influences the code generation strategy for high-level languages like C++ but has no semantic effect on hand-written assembly that utilizes instructions which inherently produce absolute address relocations.
+ The linker, therefore, correctly identifies this violation of the PIE contract and terminates with an error.
+
+Solution: Runtime Address Calculation
+-------------------------------------
+
+Resolution of this conflict necessitates the manual implementation of position-independent code within the 32-bit assembly module.
+The core principle of this technique is the elimination of all instructions that would otherwise generate absolute address relocations.
+Instead, the absolute address of any required symbol must be calculated at runtime relative to the current instruction pointer.
+
+- **The ``call``/``pop`` Idiom:**
+ The canonical technique for obtaining the value of the 32-bit instruction pointer (``EIP``) involves a ``call`` to the immediately subsequent instruction.
+ The ``call`` instruction pushes its return address—which is the address of the next instruction—onto the stack.
+ A ``pop`` instruction can then retrieve this value into a general-purpose register.
+
+ .. code-block:: gas
+
+ call .Lget_eip
+ .Lget_eip:
+ popl %ebx
+
+ Upon completion of this sequence, the ``%ebx`` register contains the absolute virtual address of the ``.Lget_eip`` label at runtime.
+ This address serves as a reliable anchor from which other symbols' addresses can be calculated.
+
+- **Establishing a Base Register:**
+ By convention, specifically within the i386 System V ABI, the ``%ebx`` register is designated for this purpose.
+ It is classified as a "callee-saved" register, which obligates any conforming function to preserve its value across calls.
+ By establishing ``%ebx`` as a base register at the commencement of the bootstrap sequence, its value can be reliably utilized for all subsequent address calculations within that scope, even after calling external C or C++ functions.
+ Using a "caller-saved" register like ``%eax`` would be incorrect, as its value would have to be considered invalid after every function call.
+
+Representative Implementations
+------------------------------
+
+The subsequent examples provide canonical implementations for converting common position-dependent assembly instructions into their PIE-compliant equivalents.
+These examples assume that a base register, ``%ebx``, has been initialized with the current location counter via the ``call``/``pop`` idiom at a label which, for the purpose of these examples, is designated ``.Lbase``.
+
+Accessing a Symbol's Address
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+This pattern is applicable when passing a pointer to a symbol as a function argument.
+
+- Problematic Code:
+
+ .. code-block:: gas
+
+ pushl $message_prefix_panic
+
+- PIE-Compatible Solution:
+
+ .. code-block:: gas
+
+ // Calculate the address: base_address + (symbol_address - base_address).
+ // The term (message_prefix_panic - .Lbase) is a link-time constant offset.
+ leal (message_prefix_panic - .Lbase)(%ebx), %eax
+ pushl %eax
+
+Accessing a Symbol's Content
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+This pattern is employed when reading from or writing to a global variable.
+
+- Problematic Code:
+
+ .. code-block:: gas
+
+ movl (vga_buffer_pointer), %esi
+
+- PIE-Compatible Solution:
+
+ .. code-block:: gas
+
+ // First, calculate the address of the pointer variable into a register.
+ leal (vga_buffer_pointer - .Lbase)(%ebx), %edi
+ // Then, dereference the pointer via the register to access its content.
+ movl (%edi), %esi
+
+Complex Addressing Modes
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+This pattern is frequently used for array access.
+
+- Problematic Code:
+
+ .. code-block:: gas
+
+ movl %eax, page_map_level_2(,%ecx,8)
+
+- PIE-Compatible Solution:
+
+ .. code-block:: gas
+
+ // Calculate the base address of the array into a register.
+ leal (page_map_level_2 - .Lbase)(%ebx), %edx
+ // Utilize the register as the base in the complex addressing mode.
+ movl %eax, (%edx, %ecx, 8)
+
+Far Jumps
+~~~~~~~~~
+
+This technique is required for critical operations such as loading a new Global Descriptor Table (GDT) and transitioning to 64-bit mode.
+
+- Problematic Code:
+
+ .. code-block:: gas
+
+ jmp $global_descriptor_table_code, $_transition_to_long_mode
+
+- PIE-Compatible Solution (using ``lret``):
+
+ .. code-block:: gas
+
+ // Calculate the absolute virtual address of the 64-bit entry point.
+ leal (_transition_to_long_mode - .Lbase)(%ebx), %eax
+
+ // Push the new segment selector and the calculated address onto the stack.
+ pushl $global_descriptor_table_code
+ pushl %eax
+
+ // lret performs a far return, using the values from the stack,
+ // thereby achieving an indirect, position-independent far jump.
+ lret
+
+.. rubric:: References
+
+.. [#1] M. Matz, J. Hubička, A. Jaeger, and M. Mitchell, “System V Application Binary Interface AMD64 Architecture Processor Supplement Draft Version,” 2012. Available: https://refspecs.linuxfoundation.org/elf/x86_64-abi-0.99.pdf
diff --git a/docs/briefs/tb0002-x86_64_bootstrap.rst b/docs/briefs/tb0002-x86_64_bootstrap.rst
new file mode 100644
index 0000000..b7a6c2a
--- /dev/null
+++ b/docs/briefs/tb0002-x86_64_bootstrap.rst
@@ -0,0 +1,154 @@
+Technical Brief 0002: x86-64 Bootstrap Subsystem
+================================================
+
+System Requirements and Constraints
+-----------------------------------
+
+The design of a clean-slate, C++23-based operating system kernel necessitates a low-level bootstrap subsystem.
+This subsystem manages the transition from the machine's power-on state to a controlled 64-bit execution environment.
+It must operate under several architectural and toolchain constraints:
+
+1. **Bootloader Conformance:** The kernel is loaded by a bootloader adhering to the Multiboot2 Specification.
+ This conformance establishes a critical contract between the bootloader and the kernel.
+ The bootstrap code must therefore correctly identify the Multiboot2 magic number (``0x36d76289``) passed in the ``%eax`` register.
+ It must also interpret the pointer to the boot information structure passed in ``%ebx`` [1]_.
+ Adhering to this standard decouples the kernel from any specific bootloader implementation, ensuring portability across compliant environments like GRUB 2.
+
+2. **CPU Mode Transition:** The CPU is assumed to be in 32-bit protected mode upon entry to the bootstrap code.
+ The subsystem is responsible for all requisite steps to enable 64-bit long mode.
+ This is a non-trivial process.
+ It involves enabling Physical Address Extension (PAE) via the ``%cr4`` control register, setting the Long Mode Enable (LME) bit in the Extended Feature Enable Register (EFER) MSR (``0xC0000080``), and finally enabling paging via the ``%cr0`` control register.
+
+3. **Position-Independent Executable (PIE):** The kernel is compiled and linked as a PIE to allow it to be loaded at an arbitrary physical address.
+ This imposes a strict constraint on the 32-bit assembly code: it must not contain any absolute address relocations.
+ While a C++ compiler can generate position-independent code automatically, in hand-written assembly this requires the manual calculation of all symbol addresses at runtime.
+ This is a significant departure from simpler, absolute-addressed code.
+
+Architectural Overview
+----------------------
+
+The bootstrap architecture is partitioned into three distinct components.
+This enforces a modular and verifiable transition sequence.
+The components are: a shared C++/assembly interface (``boot.hpp``), a 32-bit PIE transition stage (``boot32.S``), and a minimal 64-bit entry stage (``entry64.s``).
+This separation is a deliberate design choice to manage complexity.
+It ensures that mode-specific logic is isolated, preventing subtle bugs that could arise from mixing 32-bit and 64-bit concerns.
+Furthermore, it makes the state transition between each stage explicit and auditable.
+This is critical for both debugging and for the educational utility of the codebase.
+
+Component Analysis
+------------------
+
+C++/Assembly Interface (``boot.hpp``)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+A single header file serves as the definitive interface between assembly code and C++.
+This is achieved through the use of the ``__ASSEMBLER__`` preprocessor macro.
+This is a standard feature of the GNU toolchain that allows a single file to serve a dual purpose.
+
+* **Shared Constants:** The header defines all magic numbers (e.g., ``MULTIBOOT2_MAGIC``), GDT flags, and other constants required by both the assembly and C++ code.
+ This ensures a single source of truth, eliminating the risk of inconsistencies that could arise from maintaining parallel definitions in different language domains.
+
+* **Conditional Declarations:** C++-specific declarations, such as ``extern "C"`` variable declarations using the ``teachos::arch::asm_pointer`` wrapper, are confined within an ``#ifndef __ASSEMBLER__`` block.
+ This prevents the assembler from attempting to parse C++ syntax—which would result in a compilation error—while making the full, type-safe interface available to the C++ compiler.
+ The ``asm_pointer`` class is particularly important.
+ It encapsulates a raw address and prevents its unsafe use as a standard pointer within C++, forcing any interaction to be explicit and controlled.
+
+32-bit Transition Stage (``boot32.S``)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+This file contains all code and data necessary to prepare the system for long mode.
+Its logic is fundamentally incompatible with the 64-bit environment due to differences in stack width, calling conventions, and instruction encoding.
+
+* **Position-Independent Execution (PIE):** The 32-bit x86 ISA lacks a native instruction-pointer-relative addressing mode.
+ To satisfy the PIE constraint, all symbol addresses are calculated at runtime.
+ This is achieved via the ``call/pop`` idiom to retrieve the value of the instruction pointer (``%eip``) into a base register (``%esi``).
+ All subsequent memory accesses are then performed by calculating a link-time constant offset from this runtime base (e.g., ``leal (symbol - .Lbase)(%esi), %eax``).
+ This manual implementation of position independence is critical to avoid linker errors related to absolute relocations (``R_X86_64_32``) in a PIE binary.
+
+* **System State Verification:** The first actions are a series of assertions.
+ The code first verifies the Multiboot2 magic number (``0x36d76289``) passed in ``%eax`` [1]_.
+ It then uses the ``CPUID`` instruction to verify that the processor supports long mode.
+ This is done by checking for the LM bit (bit 29) in ``%edx`` after executing ``CPUID`` with ``0x80000001`` in ``%eax`` [2]_.
+ Failure of any assertion results in a call to a panic routine that halts the system.
+ This "fail-fast" approach is crucial; proceeding in an unsupported environment would lead to unpredictable and difficult-to-debug faults deep within the kernel.
+
+* **Formal Transition via ``lret``:** The stage concludes with a ``lret`` (long return) instruction.
+ This is the architecturally mandated method for performing an inter-segment control transfer.
+ This is required to load a new code segment selector and change the CPU's execution mode.
+ A simple ``jmp`` is insufficient as it cannot change the execution mode.
+ The choice of ``lret`` over other far-control transfer instructions like ``ljmp`` or ``lcall`` is a direct consequence of the PIE constraint.
+ The direct forms of ``ljmp`` and ``lcall`` require their target address to be a link-time constant.
+ This would embed an absolute address into the executable and violate the principles of position independence.
+ In contrast, ``lret`` consumes its target selector and offset from the stack.
+ This mechanism is perfectly suited for a PIE environment.
+ It allows for a dynamically calculated, position-independent address to be pushed onto the stack immediately before the instruction is executed.
+ Furthermore, ``lcall`` is architecturally inappropriate.
+ It would push a 32-bit return address onto the stack before the mode switch, corrupting the 64-bit stack frame for a transition that should be strictly one-way.
+ ``lret`` correctly models this one-way transfer and is therefore the only viable and clean option.
+
+64-bit Entry Stage (``entry64.s``)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+This file provides a minimal, clean entry point into the 64-bit world.
+It ensures the C++ kernel begins execution in a pristine environment.
+
+* **Final State Setup:** Its sole responsibilities are to initialize the 64-bit data segment registers (``%ss``, ``%ds``, etc.) with the correct selector from the new GDT.
+ It then transfers control to the C++ kernel's ``main`` function via a standard ``call``.
+ Setting the segment registers is the first action performed.
+ Any memory access in 64-bit mode—including the stack operations performed by the subsequent ``call``—depends on these selectors being valid.
+ Failure to do so would result in a general protection fault.
+
+* **Halt State:** Should ``main`` ever return—an event that signifies a critical kernel failure—execution falls through to an infinite ``hlt`` loop.
+ This is a crucial fail-safe.
+ It prevents the CPU from executing beyond the end of the kernel's code, which would lead to unpredictable behavior as the CPU attempts to interpret non-executable data as instructions.
+
+Key Implementation Decisions
+----------------------------
+
+``lret`` Stack Frame Construction
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The transition to 64-bit mode is initiated by executing an ``lret`` instruction from 32-bit protected mode.
+The behavior of this instruction is determined by the characteristics of the destination code segment descriptor referenced by the selector on the stack.
+
+The stack is prepared as follows:
+
+1. ``leal (_entry64 - .Lbase)(%esi), %eax``: The PIE-compatible virtual address of the 64-bit entry point is calculated and placed in ``%eax``.
+
+2. ``pushl $global_descriptor_table_code``: The 16-bit selector for the 64-bit code segment is pushed onto the stack as a 32-bit value.
+
+3. ``pushl %eax``: The 32-bit address of the entry point is pushed onto the stack.
+
+When ``lret`` is executed in 32-bit mode, it pops a 32-bit instruction pointer and a 16-bit code selector from the stack [3]_.
+The processor then examines the GDT descriptor referenced by the new code selector.
+Because this descriptor has its L-bit (Long Mode) set to 1, the processor transitions into 64-bit long mode.
+It then begins executing at the 64-bit address specified by the popped instruction pointer [2]_.
+
+Memory Virtualization and GDT
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+A four-level page table hierarchy (PML4) is constructed to enable paging, a prerequisite for long mode.
+An initial identity map of 32 MiB of physical memory is created using 2 MiB huge pages.
+This reduces the number of required page table entries for the initial kernel image.
+A recursive mapping in the PML4 at a conventional index (511) is also established.
+This powerful technique allows the C++ kernel's memory manager to access and modify the entire page table hierarchy as if it were a linear array at a single, well-known virtual address.
+This greatly simplifies the logic required for virtual memory operations.
+
+A new GDT is defined containing the necessary null, 64-bit code, and 64-bit data descriptors.
+The first entry in the GDT must be a null descriptor, as the processor architecture reserves selector value 0 as a special "null selector."
+Loading a segment register with this null selector is valid.
+However, any subsequent memory access using it (with the exception of CS or SS) will generate a general-protection exception.
+This provides a fail-safe mechanism against the use of uninitialized segment selectors [2]_.
+The selector for the data descriptor is exported as a global symbol (``global_descriptor_table_data``).
+This design choice was made to prioritize explicitness and debuggability.
+The dependency is clearly visible in the source code, over the alternative of passing the selector value in a register.
+This would create an implicit, less obvious contract between the two stages that could complicate future maintenance.
+
+References
+----------
+
+.. [1] Free Software Foundation, "The Multiboot2 Specification, version 2.0," Free Software Foundation, Inc., 2016. `Online <https://www.gnu.org/software/grub/manual/multiboot2/multiboot.html>`_.
+
+.. [2] Intel Corporation, *Intel® 64 and IA-32 Architectures Software Developer’s Manual, Combined Volumes 1, 2A, 2B, 2C, 2D, 3A, 3B, 3C, 3D and 4*, Order No. 325462-081US, July 2025. `Online <https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html>`_.
+
+.. [3] AMD, Inc., *AMD64 Architecture Programmer’s Manual, Volume 3: General-Purpose and System Instructions*, Publication No. 24594, Rev. 3.42, June 2025. `Online <https://www.amd.com/en/support/tech-docs/amd64-architecture-programmers-manual-volumes-1-5>`_.