Fibonacci: You're Also Doing It Wrong

Posted May 31, 2016

Earlier today, I saw C++ Weekly Ep 13 Fibonacci: You're Doing It Wrong and in my opinion the suggestion made there to use Binet's formula to calculate arbitrary elements of the Fibonacci sequence is dangerous. And as somebody is wrong on the internet I decided to write this short blog post on the topic.

At the first glance, using Binet's formula seems like a great suggestion. It is a closed form solution. It has $\mathcal{O}(1)$ complexity, i.e., fast. It is easy to implement. And more.

However, as usual when floating point is involved, danger lurks around every corner.

Although the $\mathcal{O}(1)$ complexity seems attractive, it is only $\mathcal{O}(1)$ if pow is $\mathcal{O}(1).$ Which it is for basic datatypes, but in other cases, such as arbitrary precision floats it is far more likely to be $\mathcal{O}(\log n).$ Also pow is really, really slow. We'll show later how slow it really is.

Although the implementation seems simple, it is easy to get wrong. In fact the version shown in the video is wrong.

constexpr int fib(const int i)
{
    const static auto sqrt_5 = std::sqrt(5);
    if(i == 0) return 0;
    if(i == 1) return 1;

    return static_cast<int>((std::pow(1 + sqrt_5, i) - std::pow(1 - sqrt_5, i)) / (std::pow(2, i) * sqrt_5));
}

This version is wrong for values of i as small as 10:

 0:                   0                   0 true
 1:                   1                   1 true
 2:                   1                   1 true
 3:                   2                   2 true
 4:                   3                   3 true
 5:                   5                   5 true
 6:                   8                   8 true
 7:                  13                  13 true
 8:                  21                  21 true
 9:                  34                  34 true
10:                  55                  54 false
11:                  89                  89 true
12:                 144                 143 false
13:                 233                 232 false
14:                 377                 377 true
15:                 610                 610 true
16:                 987                 986 false
17:                1597                1596 false
18:                2584                2584 true
19:                4181                4181 true
20:                6765                6764 false
21:               10946               10945 false
22:               17711               17710 false
23:               28657               28656 false
24:               46368               46367 false
25:               75025               75025 true
26:              121393              121392 false
27:              196418              196418 true
28:              317811              317811 true
29:              514229              514228 false
30:              832040              832039 false
31:             1346269             1346268 false
32:             2178309             2178309 true
33:             3524578             3524577 false
34:             5702887             5702886 false
35:             9227465             9227465 true
36:            14930352            14930351 false
37:            24157817            24157816 false
38:            39088169            39088168 false
39:            63245986            63245985 false
40:           102334155           102334154 false
41:           165580141           165580140 false
42:           267914296           267914295 false
43:           433494437           433494437 true
44:           701408733           701408732 false
45:          1134903170          1134903169 false
46:          1836311903          1836311903 true

The first column is the index, the second column is the correct value, the third one is computed with Binet's formula and the last shows if the values match. What happened? Rounding errors happened. Compounded by the truncation due to the cast to int. This can be alleviated by using std::round before casting resulting in:

 0:                   0                   0 true
 1:                   1                   1 true
 2:                   1                   1 true
 3:                   2                   2 true
 4:                   3                   3 true
 5:                   5                   5 true
 6:                   8                   8 true
 7:                  13                  13 true
 8:                  21                  21 true
 9:                  34                  34 true
10:                  55                  55 true
11:                  89                  89 true
12:                 144                 144 true
13:                 233                 233 true
14:                 377                 377 true
15:                 610                 610 true
16:                 987                 987 true
17:                1597                1597 true
18:                2584                2584 true
19:                4181                4181 true
20:                6765                6765 true
21:               10946               10946 true
22:               17711               17711 true
23:               28657               28657 true
24:               46368               46368 true
25:               75025               75025 true
26:              121393              121393 true
27:              196418              196418 true
28:              317811              317811 true
29:              514229              514229 true
30:              832040              832040 true
31:             1346269             1346269 true
32:             2178309             2178309 true
33:             3524578             3524578 true
34:             5702887             5702887 true
35:             9227465             9227465 true
36:            14930352            14930352 true
37:            24157817            24157817 true
38:            39088169            39088169 true
39:            63245986            63245986 true
40:           102334155           102334155 true
41:           165580141           165580141 true
42:           267914296           267914296 true
43:           433494437           433494437 true
44:           701408733           701408733 true
45:          1134903170          1134903170 true
46:          1836311903          1836311903 true

Problem solved, let's go home. Except we didn't solve the problem. We only solved it for 32 bit integers. For 64 bit integers all elements of the series beyond the 75th are wrong.

47:          2971215073          2971215073 true
48:          4807526976          4807526976 true
49:          7778742049          7778742049 true
50:         12586269025         12586269025 true
51:         20365011074         20365011074 true
52:         32951280099         32951280099 true
53:         53316291173         53316291173 true
54:         86267571272         86267571272 true
55:        139583862445        139583862445 true
56:        225851433717        225851433717 true
57:        365435296162        365435296162 true
58:        591286729879        591286729879 true
59:        956722026041        956722026041 true
60:       1548008755920       1548008755920 true
61:       2504730781961       2504730781961 true
62:       4052739537881       4052739537881 true
63:       6557470319842       6557470319842 true
64:      10610209857723      10610209857723 true
65:      17167680177565      17167680177565 true
66:      27777890035288      27777890035288 true
67:      44945570212853      44945570212853 true
68:      72723460248141      72723460248141 true
69:     117669030460994     117669030460994 true
70:     190392490709135     190392490709135 true
71:     308061521170129     308061521170129 true
72:     498454011879264     498454011879264 true
73:     806515533049393     806515533049393 true
74:    1304969544928657    1304969544928657 true
75:    2111485077978050    2111485077978050 true
76:    3416454622906707    3416454622906706 false
77:    5527939700884757    5527939700884755 false
78:    8944394323791464    8944394323791463 false
79:   14472334024676221   14472334024676218 false
80:   23416728348467685   23416728348467676 false
81:   37889062373143906   37889062373143896 false
82:   61305790721611591   61305790721611584 false
83:   99194853094755497   99194853094755488 false
84:  160500643816367088  160500643816367040 false
85:  259695496911122585  259695496911122528 false
86:  420196140727489673  420196140727489600 false
87:  679891637638612258  679891637638612096 false
88: 1100087778366101931 1100087778366101632 false
89: 1779979416004714189 1779979416004713728 false
90: 2880067194370816120 2880067194370815488 false
91: 4660046610375530309 4660046610375529472 false
92: 7540113804746346429 7540113804746344448 false

This is much more difficult to solve as double precision floating point numbers with their 53 bit (1 bit implied) mantissa simply cannot represent all 64 bit integer values.

Although GCC allows it, std::pow and std::round cannot legally be used in a constexpr function.

So what's the alternative? Exponentiation by squaring. This nifty algorithm allows us to compute powers (both of integers and matrices) in $\mathcal{O}(\log n)$ time, where $n$ is the exponent. What use is computing the power of a matrix in this context? Simply put, the recursion in the Fibonacci sequence can be represented by

\left(\begin{array}{cc}1 & 1 \\1 & 0 \end{array}\right)^n\left(\begin{array}{c}1\\0\end{array}\right)=\left(\begin{array}{c}F_{n+1}\\F_n\end{array}\right)

Therefore, $\mathcal{O}(\log n)$ matrix exponentiation results directly in $\mathcal{O}(\log n)$ computation of Fibonacci numbers. Here is a quick implementation:

template<typename Element>
struct mat2
{
    Element elements[2][2];
};

template<typename Element>
constexpr mat2<Element> operator * (mat2<Element> const & a_, mat2<Element> const & b_)
{
    auto const & a = a_.elements;
    auto const & b = b_.elements;
    return mat2<Element>{{
        {a[0][0] * b[0][0] + a[0][1] * b[1][0], a[0][0] * b[0][1] + a[0][1] * b[1][1]},
        {a[1][0] * b[0][0] + a[1][1] * b[1][0], a[1][0] * b[0][1] + a[1][1] * b[1][1]}
    }};
}

template<typename Element>
constexpr mat2<Element> mat_pow(mat2<Element> m, unsigned n)
{
    if(!n)
        return mat2<Element>{{{1,0},{0,1}}};
    --n;

    auto r = m;
    while(n)
    {
        if(n & 1)
            r = r * m;
        m = m * m;
        n >>= 1;
    }
    return r;
}

template<typename Integer>
constexpr Integer fib(unsigned n)
{
    constexpr auto a = mat2<Integer>{{{1,1},{1,0}}};
    auto an = mat_pow(a, n);
    return an.elements[1][0];
}

Not only is this implementation exact for any integer type and constexpr-legal (in C++14), it is also significantly faster than the floating point implementation using Binet's formula. On my computer which has an i7-4790k running at 4 GHz, using MinGW-w64 G++ 5.2 and -O3 optimization the computation of all Fibonacci numbers for $n$ between 0 and 92 (inclusive) takes 1.28 μs while the floating point version takes 11 μs. Almost a factor of 10. How come an $\mathcal{O}(\log n)$ algorithm is faster than an $\mathcal{O}(1)$ algorithm? The elusive constant factor. And the constant factor of std::pow is typically quite large.

Postscript: But it works for me!

So you read this article, and tried it yourself and the Binet version works just fine even without the call to std::round? You are probably using an x86 (32 bit) compiler. Most compilers default to allowing more precise operations on floating point values and the x86 FPU's default type is an 80-bit floating point type. Therefore, you are probably working with a higher precision type. It will still fail for 64-bit results though. To replicate this behavior on a 64 bit compiler (which by default uses SSE instructions instead of the FPU as they are usually faster), you can force the use of 80-bit floats by using long double instead of double: const static auto sqrt_5 = std::sqrt(5.0l); With this change the version without rounding works for $n$ up to 86 and the one with rounding works up to 84 (didn't I mention floating point has many pitfalls?).

Postscript 2: The naive version is the fastest on my compiler!

In one of the comments on the old site, a user pointed out that with Clang 3.8, the naive version was fastest. This was due to the fact, that the compiler performed partial loop unrolling, as can be seen here.

So yeah, unsurprisingly, even $\mathcal{O}(n)$ can be faster than $\mathcal{O}(\log n)$ or $\mathcal{O}(1)$ for such a small $n.$

Post­script: But it works for me!

Post­script 2: The naive ver­sion is the fastest on my com­pil­er!

Postscript: But it works for me!

Postscript 2: The naive version is the fastest on my compiler!