Tech Notes: Debugging LLVM + Rust
I’m working on a programming language, writing the compiler in rust. I’m stuck at this point from a segfault that occurs with the following IR (generated by my compiler):
; ModuleID = 'main'
source_filename = "main"
define void @main() {
entry:
%result = call i64 @fib(i64 1)
}
define i64 @fib(i64) {
entry:
%alloca = alloca i64
store i64 %0, i64* %alloca
%load = load i64, i64* %alloca
switch i64 %load, label %switchcomplete [
i64 0, label %case
i64 1, label %case1
]
switchcomplete: ; preds = %case1, %entry, %case
%load2 = load i64, i64* %alloca
%binop = sub i64 %load2, 1
%result = call i64 @fib(i64 %binop)
%load3 = load i64, i64* %alloca
%binop4 = sub i64 %load3, 2
%result5 = call i64 @fib(i64 %binop4)
%binop6 = add i64 %result, %result5
ret i64 %binop6
case: ; preds = %entry
ret i64 0
br label %switchcomplete
case1: ; preds = %entry
ret i64 1
br label %switchcomplete
This segfaults whenever I run my compiler, which currently compiles the code and immediately executed it in LLVM’s MCJIT.
Mystery SIGSEGV #
Whenever I run my code in my debugger, I find that I have a segfault which doesn’t occur (at least at the same time) as when I run my app on the command line.
VS Code’s debugger returns:
so something is happening during the FPPassManager. Apparently the FPPassManager is what handles generating code for functions (read in the source code)
getNumSuccessors was a bit nebulous for me… what does this function actually do? I wasn’t familiar with the term “successor”: it must be something custom to LLVM. Some Googling finds: http://llvm.org/docs/ProgrammersManual.html#iterating-over-predecessors-successors-of-blocks
So I guess successor is referring to the number of statements that immediately follow the existing statement. getNumSuccessors in core.h of llvm specifies there are function calls for a terminator. So what precisely is a terminator?
Looking through the LLVM source code again, it’s the classification for instructions that will terminate a BasicBlock. The list from LLVM9 looks like:
/* Terminator Instructions */
LLVMRet = 1,
LLVMBr = 2,
LLVMSwitch = 3,
LLVMIndirectBr = 4,
LLVMInvoke = 5,
/* removed 6 due to API changes */
Looking at the traceback, this is specifically occurring in the updatePostDominatedByUnreachable. The source code for that is:
/// Add \p BB to PostDominatedByUnreachable set if applicable.
void
BranchProbabilityInfo::updatePostDominatedByUnreachable(const BasicBlock *BB) {
const Instruction *TI = BB->getTerminator();
if (TI->getNumSuccessors() == 0) {
if (isa<UnreachableInst>(TI) ||
// If this block is terminated by a call to
// @llvm.experimental.deoptimize then treat it like an unreachable since
// the @llvm.experimental.deoptimize call is expected to practically
// never execute.
BB->getTerminatingDeoptimizeCall())
PostDominatedByUnreachable.insert(BB);
return;
}
The actual errors occurs on the first instruction of the assembly instruction:
; id = {0x00012806}, range = [0x000000000093fbb0-0x000000000093fc3b), name="llvm::TerminatorInst::getNumSuccessors() const", mangled="_ZNK4llvm14TerminatorInst16getNumSuccessorsEv"
; Source location: unknown
555555E93BB0: 0F B6 47 10 movzbl 0x10(%rdi), %eax
555555E93BB4: 48 8D 15 81 3B D5 01 leaq 0x1d53b81(%rip), %rdx
555555E93BBB: 83 E8 18 subl $0x
I can’t read assembler very well. But since this is a method, most likely the first instruction has to do with loading the current object into memory. Most likely then, getNumSuccessors is receiving a pointer to something it doesn’t expect. Most likely this is an NPE.
My hunch now is I have a basic block without a terminator statement, causing the JIT pass to fail.
There was a missing return statement on the main function. Adding that didn’t change anything.
Fixing the blocks to only have terminators did indeed fix the issue! Ultimately figuring out that a validator existed, and heeding it’s error messages lead to the solution.
https://github.com/toumorokoshi/disp/commit/1591788b8fc1871f1211c8ae6114e4d9a3fdf397