## Fibonacci: You’re Also Doing It Wrong

Earlier today I saw C++ Weekly Ep 13 Fibonacci: You’re Doing It Wrong and in my opinion, the suggestion made there to use Binet’s formula to calculate arbitrary elements of the Fibonacci sequence is dangerous. And as somebody is wrong on the internet I decided to write this short blog post on the topic.

At the first glance, using Binet’s formula seems like a great suggestion. It is a closed form solution. It has O(1) complexity, i.e., fast. It is easy to implement. And more.

However, as usual when floating point is involved, danger lurks around every corner.

1. Although the O(1) complexity seems attractive, it is only O(1) if pow is O(1). Which it is for basic datatypes, but in other cases, such as arbitrary precision floats it is far more likely to be O(log n). Also  pow is really, really slow. We’ll show later how slow it really is.
2. Although the implementation seems simple, it is easy to get wrong. In fact the version shown in the video is wrong.

This version is wrong for values of i  as small as 10:

The first column is the index, the second column is the correct value, the third one is computed with Binet’s formula and the last shows if the values match. What happened? Rounding errors happened. Compounded by the truncation due to the cast to int. This can be alleviated by using std::round before casting resulting in:

Problem solved, let’s go home. Except we didn’t solve the problem. We only solved it for 32 bit integers. For 64 bit integers all elements of the series beyond the 75th are wrong.

This is much more difficult to solve as double precision floating point numbers with their 53 bit (1 bit implied) mantissa simply cannot represent all 64 bit integer values.
3. Although GCC allows it, std::pow  and std::round  cannot legally be used in a constexpr  function.

So what’s the alternative? Exponentiation by squaring. This nifty algorithm allows us to compute powers (both of integers and matrices) in O(log n) time, where n is the exponent. What use is computing the power of a matrix in this context? Simply put, the recursion in the Fibonacci sequence can be represented by $$\displaystyle \left(\begin{array}{cc}1 & 1 \\ 1 & 0 \end{array}\right)^n\left(\begin{array}{c}1\\0\end{array}\right)=\left(\begin{array}{c}F_{n+1}\\F_n\end{array}\right)$$

Therefore, O(log n) matrix exponentiation results directly in O(log n) computation of Fibonacci numbers. Here is a quick implementation:

Not only is this implementation exact for any integer type and constexpr-legal (in C++14), it is also significantly faster than the floating point implementation using Binet’s formula. On my computer which has an i7-4790k running at 4 GHz, using MinGW-w64 G++ 5.2 and -O3 optimization the computation of all Fibonacci numbers for n between 0 and 92 (inclusive) takes 1.28 μs while the floating point version takes 11 μs. Almost a factor of 10. How come an O(log n) algorithm is faster than an O(1) algorithm? The elusive constant factor. And the constant factor of std::pow  is typically quite large.

Postscript: But it works for me!

So you read this article, and tried it yourself and the Binet version works just fine even without the call to std::round ? You are probably using an x86 (32 bit) compiler. Most compilers default to allowing more precise operations on floating point values and the x86 FPU’s default type is an 80-bit floating point type. Therefore, you are probably working with a higher precision type. It will still fail for 64-bit results though. To replicate this behavior on a 64 bit compiler (which by default uses SSE instructions instead of the FPU as they are usually faster), you can force the use of 80-bit floats by using long double  instead of double : const static auto sqrt_5 = std::sqrt(5.0l); With this change the version without rounding works for n up to 86 and the one with rounding works up to 84 (didn’t I mention floating point has many pitfalls?).

## Generic folding without C++17 fold expressions

After commenting on Baptiste Wicht’s blog post on C++ 17 fold expressions (this post is an extension of my comment) and reading N4295, I was left wondering what purpose they fulfill.

C++17 fold expressions exist in four forms (although the last two are syntactically identical, the only difference being which expression is an unexpanded parameter pack):

This is certainly cleaner than writing a recursive function each and every time we want to fold a parameter pack using a binary operator. Nice, right? But what about other binary functions? Why limit a fold only to fixed operators? Sure, you could write a wrapper that defines a binary operator as the application of an arbitrary function as this Stack Overflow answer suggests, but that seems less than clean. Another thing that bothers me, is that the syntax does not directly suggest if a left or right fold is performed.

Let’s approach this from another direction. What can you use right now with a C++14 compiler? By using automatic return type deduction and passing a functor, it is fairly easy to write generic left and right fold functions:

This might seem to complicate usage with simple operators but  <functional> already offers generic functors for most operators (except shifts and augmented assignments), so the equivalent to the first block of code would be:

There are a couple of possibilities of how this could be extended, for example a neutral_value<Functor> type trait or template variable could be introduced to allow for folds of empty parameter packs (such as fold expressions allow for some operators).

There might be some use cases in Concepts as Kerrek SB suggested, but it is unclear to me which ones these would be. I could not find anything in the corresponding papers.

Does this mean I am opposed to fold expressions? Not necessarily. There’s nothing wrong with a little syntactic sugar. Range based for loops could also have been implemented using only lambda and a template function.  constexpr functions can often be replaced with template metaprogramming, but using  constexpr functions is usually easier on the compiler and results in shorter compile times.