Fibonacci: You’re Also Doing It Wrong

Earlier today I saw C++ Weekly Ep 13 Fibonacci: You’re Doing It Wrong and in my opinion, the suggestion made there to use Binet’s formula to calculate arbitrary elements of the Fibonacci sequence is dangerous. And as somebody is wrong on the internet I decided to write this short blog post on the topic.

At the first glance, using Binet’s formula seems like a great suggestion. It is a closed form solution. It has O(1) complexity, i.e., fast. It is easy to implement. And more.

However, as usual when floating point is involved, danger lurks around every corner.

  1. Although the O(1) complexity seems attractive, it is only O(1) if pow is O(1). Which it is for basic datatypes, but in other cases, such as arbitrary precision floats it is far more likely to be O(log n). Also  pow is really, really slow. We’ll show later how slow it really is.
  2. Although the implementation seems simple, it is easy to get wrong. In fact the version shown in the video is wrong.

    This version is wrong for values of i  as small as 10:

    The first column is the index, the second column is the correct value, the third one is computed with Binet’s formula and the last shows if the values match. What happened? Rounding errors happened. Compounded by the truncation due to the cast to int. This can be alleviated by using std::round before casting resulting in:

    Problem solved, let’s go home. Except we didn’t solve the problem. We only solved it for 32 bit integers. For 64 bit integers all elements of the series beyond the 75th are wrong.

    This is much more difficult to solve as double precision floating point numbers with their 53 bit (1 bit implied) mantissa simply cannot represent all 64 bit integer values.
  3. Although GCC allows it, std::pow  and std::round  cannot legally be used in a constexpr  function.

So what’s the alternative? Exponentiation by squaring. This nifty algorithm allows us to compute powers (both of integers and matrices) in O(log n) time, where n is the exponent. What use is computing the power of a matrix in this context? Simply put, the recursion in the Fibonacci sequence can be represented by \displaystyle \left(\begin{array}{cc}1 & 1 \\ 1 & 0 \end{array}\right)^n\left(\begin{array}{c}1\\0\end{array}\right)=\left(\begin{array}{c}F_{n+1}\\F_n\end{array}\right)

Therefore, O(log n) matrix exponentiation results directly in O(log n) computation of Fibonacci numbers. Here is a quick implementation:

Not only is this implementation exact for any integer type and constexpr-legal (in C++14), it is also significantly faster than the floating point implementation using Binet’s formula. On my computer which has an i7-4790k running at 4 GHz, using MinGW-w64 G++ 5.2 and -O3 optimization the computation of all Fibonacci numbers for n between 0 and 92 (inclusive) takes 1.28 μs while the floating point version takes 11 μs. Almost a factor of 10. How come an O(log n) algorithm is faster than an O(1) algorithm? The elusive constant factor. And the constant factor of std::pow  is typically quite large.

Postscript: But it works for me!

So you read this article, and tried it yourself and the Binet version works just fine even without the call to std::round ? You are probably using an x86 (32 bit) compiler. Most compilers default to allowing more precise operations on floating point values and the x86 FPU’s default type is an 80-bit floating point type. Therefore, you are probably working with a higher precision type. It will still fail for 64-bit results though. To replicate this behavior on a 64 bit compiler (which by default uses SSE instructions instead of the FPU as they are usually faster), you can force the use of 80-bit floats by using long double  instead of double : const static auto sqrt_5 = std::sqrt(5.0l); With this change the version without rounding works for n up to 86 and the one with rounding works up to 84 (didn’t I mention floating point has many pitfalls?).

10 thoughts on “Fibonacci: You’re Also Doing It Wrong”

  1. I benchmarked a few different versions of fib, both with clang 3.8.0 and gcc 6.1.1.

    method clang 3.8.0 gcc 6.1.1
    iterative 4487 ns 8869 ns
    binet 37624 ns 45044 ns
    matrix 10515 ns 9304 ns

    The benchmark was calculating fib 1-100 a million times. With clang, the iterative version ended up being about twice as fast as the matrix version. Using perf to dig into it more, it appears the matrix version spends a lot of time in multiply and assignments.

    I imagine for larger values of n the matrix version will win out eventually.

    1. From those numbers it looks as if you are comparing computing the entire series from 1 to n, not comparing computing arbitrary elements of the series. Both the Binet and the matrix approach are for computing arbitrary elements of the series. If you also want to compute all intermediate elements as well, the iterative method is obviously the fastest one.

      1. The benchmark from I ran was calculating fib(1), fib(2), …, fib(n) a million times. The iterative solution was stateless so it was not saving the results as it went.

        Even computing arbitrary elements, binet and matrix are slower. I ran a benchmark with a random sample of 1-n, but didn’t post the numbers since after looking at perf most of the time is spent in the random number generator.

        1. Did you enable optimizations? Did you make sure your benchmark is actually measuring what you expect and the code wasn’t optimized away (or optimized in an unexpected fashion such as a common loop prefix being recognized and discarded)?

          With -O3 optimization and the computation and the benchmark in separate compilation units, the iterative solution is 2x slower than the matrix solution on my laptop.

          1. My flags are -std=c++14 -O3 -march=native. I am forcing the compiler to not optimize away the loop. One thing that is interesting is that on gcc-4.9 and clang++-3.6, I get your result. Iterative is twice as slow.

            From looking at http://godbolt.org/#, clang++-3.8 does a fancier optimization of the inner loop in the iterative version.

  2. Here is mine version.

    Since the variable declarations are outside the loop, clang is able to do some fairly fancy loop optimizations where it is essentially pre-calculating some of the fib results.

    1. Sorry for only replying now, I must have overlooked the notification about your comment. This doesn’t so much look like precomputation to me than like good old partial loop unrolling.

      In essence this just proves that the often ignored constant factors matter a lot, as in this case the O(n) algorithm is faster than the O(log n) algorithm which is faster than the O(1) algorithm. Which is of course in no small part due to the fact that n is limited to a fairly small range in this case.

  3. Well, I have been recently surprised by how much interest there is in calculating Fibonacci numbers.
    In my opinion this is not so much a question for computation but first choosing the right scope as depending on the circumstances some solution are simply not applicable.
    First for correctness, if only rough approximation would do then Binet is applicable, if correctness is needed then something else is needed.
    Second for the type, depending of the size of the Fibonacci number expected:
    – For “int”, “unsigned long long” or the like of “uint128_t”: because the Fibonacci sequence increase relatively quickly there are not so many results so why bother calculating them at all as it could be just a contexpr std:array (or similar) of the appropriate type and would make it easier to detect when the type is overflowing.
    – For bigger type, it then fall into the like of GMP or Boost Multiprecision (or something similar purpose made to optimize the algorithm selected). Then again, the temptation would be to look for an algorithm (like the Matrix Exponentiation) but it depends what we want to do with the numbers (find them all, use a few in other computations) then again a precomputed range may be sufficient.
    – But for even bigger type (for instance if it does not fit into one machine memory) then the crux of the problem might be on how to split the work on multiple machines.
    So without defining the constraints of the problem, it would seems that it is a problem which can only be done wrongly.

    1. There’s probably very little interest in actually computing Fibonacci numbers, they just lend themselves well to discussions about algorithms, in my opinion.

      In general, every problem in computing must be precisely defined before a correct solution can be found. And yes, a lookup table might well be the best solution in some cases. Although those cases have been becoming fewer and fewer over the last couple of years, as the divide between memory and processor speed has been increasing.

Leave a Reply

Your email address will not be published. Required fields are marked *