Debugging Stories: Stack alignment matters

May 12th, 2017

In systems programming, bugs in the execution environment can manifest in unexpected ways. This makes debugging a challenge, since when a program crashes, it’s unclear which layer(s) of abstraction contains the bug. If the application itself is fine, an incorrectly-initialised language runtime can still cause it to crash. A bug in the compiler can cause it to generate faulty machine code. A kernel bug can bring down the entire system. And then there’s the possibility of hardware bugs.

What follows is a story of undefined behaviour and poorly documented compiler conventions, where I peel back the layers of abstraction to fix a subtle bug in our user-level thread initialisation code.

A mystery…

The story begins with an email from a customer building an app with seL4. A thread was faulting, and the fault handler was displaying:

FAULT HANDLER: user exception (number 13, code 0)
from some_process.some_interface_thread (ID 0x4),
pc = 0x10154c, sp = 0x233c44, flags = 0x10212

What’s this message actually telling us?
number 13 The fault number. This was running on an x86 machine, so fault 13 is a general protection fault.
some_process.some_interface_thread The name of the faulting thread (the faulting thread had a different name, but I’ve renamed it for clarity).
pc = 0x10154c Address of instruction that triggered the fault.
sp = 0x233c44 Stack pointer when the fault was triggered.

The email also included a snippet from an objdump of the faulting program:

10154c:       0f 28 44 24 10          0x10(%esp),%xmm0

Unfortunately, the customer couldn’t show me their source code. Nonetheless, the fault message and faulting instruction tell us a lot.

movaps

The movaps instruction moves four single-precision floating-point values between two XMM registers, or between XMM registers and memory. When moving to or from memory, a general protection fault will occur unless the address is aligned to a 16-byte boundary. The faulting code was attempting to execute movaps 0x10(%esp),%xmm0. The source operand, 0x10(%esp), refers to the memory at stack pointer + 16 bytes. When movaps is executed, the stack pointer is 0x233c44, which is not aligned to 16 bytes, so neither is 0x233c44 + 16. Thus, the processor emitted a general protection fault.

Reproducing the fault

Why was movaps being executed with an incorrectly-aligned stack? This is a compiled C program, so for some reason gcc assumed that the stack pointer would be 16-byte aligned at the point where the movaps instruction was executed. And for some reason, when the program ran, and got to the movaps instruction, the alignment of the stack pointer was not 16 bytes.

To solve this mystery, I set out to reproduce the conditions under which this fault seemed to occur. Our customer is building their program with the CAmkES application framework. In CAmkES a process’s threads are either “control” threads, which perform application-specific tasks, or “interface” threads, which communicate with other processes. From the faulting thread’s name, I know it’s an interface thread. I wanted to get the compiler to emit a movaps instruction with an operand relative to the stack pointer, and execute the resulting code in a CAmkES interface thread, to see if it would fault. My approach was to perform lots of copies of stack-allocated float arrays to present the compiler with opportunities to use movaps.

typedef struct foo {
    float arr[4];
} foo_t;

static inline void memcpy_test(foo_t *a, foo_t *b) {
    foo_t intermediate = *a;
    intermediate.arr[1] += 0.5;
    memcpy(b, &intermediate, sizeof(foo_t));
}

Calling this inline function in various contexts eventually yielded the result I wanted. Here’s the snippet from an objdump of the test app I made:

memcpy(b, &intermediate, sizeof(foo_t));
10112d: 66 0f 6f 44 24 20 movdqa 0x20(%esp),%xmm0
101133: 0f 29 44 24 10 movaps %xmm0,0x10(%esp)

And running it:

FAULT HANDLER: user exception (number 13, code 0)
from server.b (ID 0x3),
pc = 0x10112d, sp = 0x229c74, flags = 0x1024

That’s the fault I was looking for. Sort of. Look carefully at the faulting instruction pointer. The faulting instruction was actually the movdqa immediately preceding the movaps. Much like movaps, the movdqa instruction copies 128 bits from its source operand to its destination operand, and faults if address operands are not 16-byte aligned.

Why is this happening?

What makes the compiler think it’s safe to emit these instructions? It’s clearly assuming something about the alignment of the stack pointer, but what? The wikipedia page on x86 calling conventions says that “the stack must be aligned to a 16-byte boundary when calling a function”, but the citation for this claim points to this bug report. To gain some more insight, I experimented with the following assembly function, which returns the value the stack pointer had immediately after it was called.

.global getsp
 .text
 getsp:
     mov %esp, %eax
     ret

Try this at home. Call the above function from a C program with the prototype uintptr_t getsp(void);. On 32-bit x86 machines, it should always return addresses ending in hex c (ie. 12). Such addresses are all 4 less than a 16-byte aligned address. If the stack pointer was 16-byte aligned when the function was called, after pushing the (4 byte) return address, the stack pointer would be 4 bytes less, as the stack grows downwards. This is consistent with what wikipedia suggested. The compiler is maintaining a 16-byte alignment of the stack pointer when a function is called, adding padding to the stack as necessary. The compiler knows that the stack will always be aligned correctly, so it can emit instructions with alignment requirements without risk of triggering their fault conditions.

RTFM

Much later, after I’d resolved the problem, (in fact it was part of the feedback on a draft of this post), someone pointed out that somewhere in gcc’s 14000-line man page, there’s a section:

-mpreferred-stack-boundary=num
Attempt to keep the stack boundary aligned to a 2 raised to num byte boundary.  If -mpreferred-stack-boundary is not specified, the default is 4 (16 bytes or 128 bits)

So this behaviour is documented after all! Man pages are a great way to learn more about a feature you already know exists. However large man pages aren’t very useful when you don’t know what you’re looking for.

Tracking down the problem

I used my getsp function to check the stack alignment at various points of my faulting program to see where the incorrect alignment is introduced. My first experiment was calling it from an interface thread, and a control thread, to see if the problem was occurring on all threads. On the control thread, getsp consistently returned addresses ending in hex c (ie. 12), which is correct after compensating for the return address. On an interface thread, it consistently returned addresses ending in a 0.

So control threads are fine and interface threads are broken. What’s the difference between control and interface threads? All CAmkES threads start executing in some hand-written assembly that performs some CAmkES-specific initialisation. Then, control threads jump to the C library’s _start function, which initialises the C library, and on 32 bit x86, aligns the stack pointer to a 16-byte boundary, before calling main. Interface threads don’t go through _start, as every process has a control thread which initialises the C library, so there’s no need for interface threads to re-initialise it. This means that unless the stack pointer happened to be aligned to 16 bytes when an interface thread first called into compiled C code, the compiler will maintain this incorrect alignment across subsequent function calls.

Found it!

To summarise the problem: The first function call made by CAmkES interface threads from hand-written assembly took place with a stack pointer that was not aligned to a 16-byte boundary. Functions are compiled under the assumption that they will always be called with a 16-byte aligned stack pointer. The compiler emits instructions with alignment-requirements on their operands, and uses addresses relative to the stack pointer as operands to such instructions. A fault was triggered by one of these instructions being given an unaligned address as an operand, because the stack was not aligned as the compiler assumed, when the first function call was made.

The solution: modify the code that starts a thread such that threads are started with a 16-byte aligned stack pointer, and modify the assembly at the thread entry point, to pad the stack, ensuring correct alignment for the first function call.

The take-away

Starting a thread is more complicated than pointing the processor at an entry point and saying “go”. Correctly initialising the language runtime environment for the program that’s about to start running is important to prevent crashes, or worse, undefined behaviour. Despite its low level of abstraction, C has a minimal runtime environment, which includes assumptions made by the compiler. When linking hand-written assembly with c, or starting a new thread, there are rules that must be adhered to, and they might not be clearly written down.

2 comments

  1. Hello,

    Isn’t seL4 supposed to be proven, so how this kind of bug can occur? Or was this bug out of the Trusted Computing Base of seL4?

    Sincerely yours,
    D. Mentré

    1. Hi David,

      Yes, this occured outside of the trusted code base of the seL4 kernel — specifically, it’s a bug in the userspace library code, which is completely separate from the verified kernel code 🙂

Commenting on this post has been disabled.