More fun with Stack Overflow Errors

In a previous article, we found that the amount of memory I can allocate over the main function on the stack is  8379000 ± 1400 bytes for my machine. Today we’ll use this result to learn a bit more about our machines.

Take a look at this  C program:

At any instant during the execution of this program,  count will contain the number of times  func has been called, or, the number of stack frames of the function  func over  main. It also prints the value of  count at every invocation. I ran this program a couple of times and the program crashed after around  523500 calls to the function  func.

This tells us that the  8379000 bytes of stack memory available to our program is exhausted after  523500 calls of  func. So the amount of memory allocated on the stack for one invocation of the function  func is  8379000 / 523500 ≈ 16.00  bytes.

That’s when I used the  GCC compiler with -O0 flag. I get similar results with -O1, but for -O2 and -O3, the program never crashes! It keeps on printing numbers ad infinitum. Why’s that?

Hello Compiler Optimizations

Compilers of various languages can identify common patterns in the assembly code generated (or any other intermediate representation), and optimize that part of code with an equivalent but faster implementation. For example:

can get optimized to:

This is an example of function inlining. In the case of our code, the compiler optimizes our method func to:

Which makes sense, since that’s all this function does.

Let’s take a look at the disassembly of the unoptimized and optimized function definitions:

 

As you can see, in the unoptimized version, func calls itself at line 9. But in the optimized version, it just jumps to the top ( jmp .L2).

This particular form of optimization is called tail call optimization. The compiler performs this optimization if a function

  1. Calls a function (possibly itself) as the last operation and
  2. Returns the result of a call to a function (possibly itself)

The compiler uses strategies like replacing the call with a loop, replacing the call with the code of callee etc to dissolve the tail call.

A Slightly Less Trivial Example

The example I gave above was a trivial one. It’s possible for the compiler to end up in more complex situations. For example:

f and g call each other in a bi-recursive pattern. plus_minus_one returns +1 or -1 randomly. We’ve marked it as noinline cos if we don’t, the compiler will replace it’s invocation with it’s code (or with the code of rand). We want it to be treated as a blackbox, so that we can focus on the optimizations related to recursion. That’s what gcc apparently does in this situation:

Note: I’m not actually familiar with the exact optimization methods gcc uses. But the generated assembly seems to suggest these steps.

  • Replace the recursive call with the actual code of function being called. This results in the following code:

  • This transforms the bi-recursive pair to two separate recursive functions. After that the compiler just replaces these recursive calls with loops, similar to our first example.

This gives us two non-recursive functions, which are much more performant than the original, recursive counterparts, and won’t cause stack overflow errors.

 See you later.

Leave a Reply

Your email address will not be published. Required fields are marked *